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HAEMOPHILUS ADHESION PROTEINS 



The U.S. Government has certain rights in this inventionpursuantto grant numbers 
AI-21 707 and HD-29687 from National Institutes of Health. 

FIELD OF THE INVENTION 

5 The invention relates to novel Haemophilus adhesion proteins, nucleic acids, and 
antibodies. 

BACKGROUND OF THE INVENTION 

Most bacterial diseases begin with colonization of a particular mucosal surface 
(BeacheyetaL, 1981, J. Infect. Dis. 143:325-345). Successful colonization requires 

1 0 that an organism overcome mechanical cleansing of the mucosal surface and evade 
the local immune response. The process of colonization is dependent upon 
specialized microbial factors that promote binding to host cells (Hultgren et al , 
1 993 Cell, 73:887-901). In some cases the colonizing organism will subsequently 
enter (invade) these cells and survive intracellular^ (Falkow, 1991, Cell 65: 1099- 

15 1102). 

Haemophilus influenzae is a common commensal organism of the human respiratory 
tract (Kuklinska and Kilian. 1 984, Eur. J. Clin. Microbiol. 3:249-252). It is the most 
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• • a n leading cause of other invasive 

common « - — « — *" - ' 't" responsible for . *** 
•rt diseases In adoition, this-organismis response 

.„/ 1003 proc.Nari.Acad.ScuU.b.A.w.* 
ii:960-962; St. Geme ef «l. 1993. these 

c» .Anift-4044- St. Geme and Falkou .iwi.hu 
lnfe a .mmu, M , KSU ,, mis „ is an imp*™, 

1 333. Infecl- lmm>" 59.3366 J;. l984.J.Med. 
fho.hlocaliadrespiratoryltac.andsystcm.cd.seaKdurt... 

cause of both localizeore H account for the majonty 

>.• , ml 161 NonencapsuUted.non.typable strains account i 

. capsule composed po * ^ ^ , 982 . 

responsible for over 95 /.of cases . 0 ,„ 5H Sell and P.F.Wright 

CliMcalintponanceofHrf-^'^- 9 ' ' ^ of 

Lase Elsev^ordt-Holland Publishing Co.. New VorU 

• <• due to H. influenzae involves 

The inmal step in the pyogenes, of d-sease due to 

■ nftheupperre SP iratorymuc 0 sa(Murphye/a/..l987Jlnt 
colomzationof the upper rcsp ,„^«ist for weeks to months. 

s 1986.1. Infect. D.s. 1.4:100-109). Ho resuUine in local 
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on occasion bacteria will penetrate the nasopharyngeal epithelial barrier and enter 
the bloodstream. 

In vitro observations and animal studies suggest that bacterial surface appendages 
called pili (or fimbriae) play an important role in H. influenzae colonization. In 
5 1 982 two groups reported a correlation between piliationand increased attachment 
to human oropharyngeal epithelial cells and erythrocytes (Guerina et a/., supra; 
Pichichero et ai, supra). Other investigators have demonstrated that anti-pilus 
antibodies block in vitro attachment by piliated H. influenzae (Forney et ai . 1 992. 
J. Infect. Dis. 165:464-470: van Alphen era/., 1988Jnfect.Immun. 56:1 800-1806) 
10 Recently Weber et al insertionally inactivated the pilus structural gene in an H. 

influenzae type b strain and thereby eliminated expression of pili; the resulting 
mutant exhibited a reduced capacity for colonization of year-old monkeys (Weber 
etai. 1991. Infect. Immun. 59:4724-4728)! 

A number of reports suggest that nonpilus factors also facilitate Haemophilus 
1 5 colonization. Using the human nasopharyngeal organ culture model. Farley et ai 
(1986. J. Infect. Dis. 161:274-280) and Lotbetai (1988. Infect. Immun. 49:484- 
489) noted that nonpiliated type b strains were capable of mucosal attachment. Read 
and coworkers made similar observations upon examining nontypable strains in 
a model that employs nasal turbinate tissue in organ culture (1991. J. Infect. Dis. 
20 163:549-558). In the monkey colonization study by Weber et ai (1991. supra). 

nonpiliated organisms retained a capacity for colonization, though at reduced 
densities: moreover, among monkeys originally infected with the piliated strain, 
virtually all organisms recovered from the nasopharynx were nonpiliated. AH of 
these observationsare consistent with the finding that nasopharyngeal isolates from 
25 children colonized with H. influenzae are frequently nonpiliated (Mason et ai . 1 985. 
Infect. Immun. 49:98-103; Brintone/aA. 1989. Pediatr. Infect. Dis. J. 8:554-561 ). 
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Previous studies have shown that H. influenzae are capable of entering (invadmg) 
cultured human epithelial cells via a pili-independent mechanism (St. Geme and 
Falkow,1990.supra;St.GemeandFalkow,1991.supra). Although//, influenzae 

is not generally considered an intracellular parasite, a recent report suggests that 
5 th ese^v/rr 0 fmd 1 n gS mayhavean i nwv 0 correlate(ForsgreneraL 1994.su P ra). 

ForsgrenandcoworkersexarninedadenoidsfomlOcruldrenwhohadto 
removedtecauseoflongstandingsecretoryoutismediaoradenoidalhypertrophy. 

inall lOcasestherewereviableintracellularH influenzae. Electron microscopy 
demonstrated that these organisms were concentrated in the reticular crypt 
0 epitheliumandinmac^^ 

possibilitv is that bacterial entry into host cells provides a mechamsm for evas.on 
ofthe local immuneresponse^therebyallowmgpers.stencein the respiratory tract. 

Thus a vaccine for the therapeutic and prophylactic treatment of HaemopMus 
infection is desirable. Accordingly, it is an object ofthe present invention to prov.de 
15 forrecombinanttoo^AdherenceCH^proteinsandvariantsthereof.and 

to produce useful quantities of these HA proteins using recombmant DNA 

techniques. 

H.safunherobjecofAeinvemion.oprovider.combin.n.nucl.icacid.encoding 
HA proteins, and expression vectors and ho* ce»s containing .he nucle.c ac.d 

20 encoding the HA protein. 

An additional objec, of the invention is to prov.de monoclona. antibodies for the 
diagnosis at Haemophilus infection. 

A further objec, of the mvention is to provide methods for producng the HA 
protetns. and a vaccine comprising the HA prote.ns of the present mvenuon. 
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Methods for the therapeutic and prophylactic treatment of Haemophilus infection 
are also provided. 

SUMMARY OF THE INVENTION 

In accordance with the foregoing objects, the present invention provides recombinant 
5 HA proteins, and isolated or recombinant nucleic acids which encode the HA 
proteins of the present invention. Also provided are expression vectors which 
comprise DNA encoding a HA protein operably linked to transcriptional and 
translations regulatory DNA. and host cells which contain the expression.vectors. 

The invention provides also provides methods for producing HA proteins which 
0 comprises cuituring a host cell transformed with an expression vector and causing 
expression of the nucleic acid encoding the H A protein to produce a recombinant 
HA protein. 

The invention also includes vaccines for Haemophilus influenzae infection 
comprising an HA protein for prophylactic or therapeutic use in generating an 
immune response in a patient. Methods of treating or preventing Haemophilus 
influenzae infection comprise administering a vaccine. 



BRIEF DESCRIPTION OF THE DRAWINGS 
Figures 1A. IB. and 1C depict the nucleic acid sequence of HA1. 
Figure 2 depicts the amino acid sequence of HA1 . 

20 Figures3A. 3B. 3C. 3D. 3E. 3F and 3G depictthe nucleic acid sequence and amino 
acid sequence of HA2. 
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F,gure4sho W stheschematicalignmentofHAl and HA2. Regions of sequence 
similarity are indicated by shaded, striped, and open bars, corresponding to 
N-terminaldomains^^ 

solid circlesrepresenta conserved Walker box ATP-binding motif (GINVSGKT). 
Numbers above the bars refer to amino acid residue positions in the full-length 
protein, Numbers in parentheses below the HA2 bars represent percent 
similarity/percent identity between these domains and the corresponding HAl 
domains. The regions of HA2 defined by amino acid residues 51 to 173. 609 to 
846.and 1292 to 1475 show minimal similarity to amino acids 51 to220ofHAl. 

Fi g ure5 depicts the homology between the N-terminal amino acid sequences of 
HAl and HA2. Single letter abbreviations are used for the amino acid, A line 
indicatesidentitybetweenthere S idues.andtwodotsindicateconservativechan g e, 

i.e. similarity between residues. 

Fig^ed.pimtheresmcionmapsofphag.U-nandplasmWpTT^subclon^ 

Fi g u re 7d e picts*eresmc,ionmapofpDC400and<leriva,w K .pDC400con>ains 

a9 ! kbinsmtomsmtinC54cl<™d m .opUC19. Vector sequences are represented 
bvha.chedboxe,Le.,ersabove.he.ophon 2 o n uM in eindica,eres.,ic,ionenzy m e 

she, Bg B g M. E. Ea*i: H. HMK P.™ S. S* Ss.M: X. AM. The heav>- 
h o, i2 onulU„ev, i ,ha.owreprese„ t s,h=,ocatio„of.he fe /locu S w i *,„pDC400 

and .he direction of .ranscription. The striated horizonta. line represents .he U 
kb intragenic fragment used as a probe for Southern an^ysis- The piasmid pDC602. 
which is not shown, contains the same insert as P DC601. bu, in the oppostte 

orientation. 

RE „re 8 shows the identification of piasmid-encoded proteins using the 
bacteriophage T7 expresston sys,em. Bacteria were radiolabeled w„h 
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trans- [ 33 S]-IabeL and whole cell lysates were resolved on a 7.5% 
SDS-polyacrylamidegel. Proteins were visualized by autoradiography. Lane L 
E. coli BL21(DE3)/pT7-7uninduced; lane 2, BL21(DE3)/pT7-7 induced; lane 3 t 
BL21(DE3)/pDC602 uninduced; lane 4. BL21(DE3)/pDC602 induced; lane 5. 
5 BL21(DE3)/pDC601 uninduced; lane 6, BL21(DE3)/pDC601 induced. The 
plasmids pDC602 and pDC601 are derivatives of pT7-7 that contain the 8.3 kb Xbal 
fragment from pDC400 in opposite orientations. The asterisk indicates the 
overexpressed protein in BL21(DE3)/pDC601. 

Figure 9 depicts the southern analysis of chromosomal DNA from H influenzae 
1 0 strains C54 and 1 1 , probing with HA2 versus HAL DNA fragments were separated 
on a 0.7% agarose gel and transferred bidirectionally to nitrocellulose membranes 
prior to probing with either HA 1 or HA2. Lane 1 . C54 chromosomal DNA digested 
with BgM; lane 2. C54 chromosomal DNA digested with C/al; lane 3. C54 
chromosomal DNA digested with Psih lane 4. 1 1 chromosomal DNA digested with 
1 5 BgM: lane 5. 1 1 chromosomal DNA digested with Clal: lane 6, 1 1 chromosomal 
DNA digested with Xbal. A. Hybridization with the 3.3 kb Pstl-BgM intragenic 
fragment of HA2 from strain C54. B. Hybridization with the 1.6 kb SiyUSspl 
intragenic fragment of HA J from strain 11. 

Figure 10 depicts the comparison of cellular binding specificities of £ coli DH5ct 
20 harboring HA2 versus HA L Adherence was measured after incubating bacteria 
with eucaryotic cell monolayers for 30 minutes as described and was calculated 
by dividing the number of adherent colony forming units by the number of 
inoculated colony forming units (St. Geme et al.. 1993). Values are the mean ± 
SEM of measurements made in triplicate from representative experiments. The 
25 plasmid pDC60I contains the HA 2 gene from H influenzae strain C54. while 

pHM W8-5 contains the HA I gene from nontypable H. influenzae strain 1 1 . Both 
pDC601 and pHMW8o were prepared using pT7-7 as the cloning vector. 
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Figure 1 . deplete comparisonof .he N-terminal ermines of HAT HMW. 
HMW^ AlDA-..Tsi,a»dS=pA. The N-terminal sequence of HA2 is aligned with 
th oseo"fHAl(Ba^a m p.S J ..and J .W.StG«ne,.inden<ir,ca«o„ofasecond 

family of high mo,ecu,ar weigh, adhesion pro K ms expressed by non^paWe 
HamopUl us Mo,. Microbio... in press.,, HMW, and HMV.2 

^vsisof genes encoding nontyp^ble H-*** «~ 
v»igh,surface-e*posedp— , att d,o fi la m en,o U shen MB gluunino f i.o^* 
— infec. immun. 6.:.302->3.3.,. A.DA-1 (Ben, ...and M.A. Schnnd,. 
,00, AIDA-1. .he adhesin involved in diffuse adherence of .he diarrhoeagemc 
UrtWl s.rain2787 ,0.26:H27).is syn.hes.*dvia a precursormolecme. 
Mo. Microbio,.«:.530.,546.).Tsh(r™vence.D.a I K.R.Cu^m..994..so,a t io„ 
andcha^.eriza.io^f.geneinvolvedin^magglu.ina.ionbyanavianpa.hogen. 

OdfcHr* coU strain, .nfec. .mnrun. «2:.369->380.). and Sep A 
(B e„je.,oun-Touimi, Z.. PJ. Sansoneui. and C. Parsot. .995. SepA. the major 

,„ .issue i„va S io„.Mol.MicrobioU7 : .23..35.,.Aeonsensus seouencets shown 

on the lower line. 

Figure 12 depics .he somber* analysis of chromosomal DNA fr» 

,„ ,„(!,,„„- u ..v D eb Chromosomal DN A was 
epidemioloBicallydislinclslrainsofHin/uen-ueOT"^ 

drgestedwith m. separa.edona 0.7"/. agarose gel. transferred*. nnrocelluU,,. 
a „d pro bedwi.hd,e3.3 k bm- Sg min ra gen,er ra gmen.of fa /froms, ram C54. 

Lane ,.«rainC54: lane2. suatn ,08.,ane .suain ,065;lane4. strainl *Une 
5. s.rai» 1060: .ane 6. strain .053: ,ane7. suain .063: lane 8. suatn .069: lane 9. 
strain 1070: lane 10. s.rain 1076: toe ... «ram 1084. 

. Figure ,3 depicts the southern analysis of c_a, OKA f— , b 
enra psu,a.eds™nsofH. infl^a, ChromosomalDNAwasd.ges.edw,*^.. 
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separated on a 0.7% agarose gel. transferred to nitrocellulose, and probed with the 
3.3 kb Pstl-Bglll intragenic fragment of hs/from strain C54. Lane 1. SM4 (type 
a): lane 2. SM72 (type c): lane 3. SM6 (type d); lane 4. Rd (type d); lane 5 SM7 
(type e); lane 6. 1 42 (type e): lane 7. 327 (type e); lane 8, 35 1 (type e); lane 9 1 34 
(type f); Jane 10, 219 (type f): lane 1 1. 346 (type f): lane 12. 503 (type f). 

Figures 14A and 14B are the nucleic acid sequence of HA3. 

Figure 1 5 is the amino acid sequence of HA3. 

Figures , 6A and , 6B depic, U* homology be,ween fc amino acid sequences of 
HA J and HA3. Single lener abbreviauons are used for the amino acids Aline 

.nd.ca.esiden.i.ybetweentheresidues.and.wodo^indica.econserva.ivechange, 
i.e. similarity between residues. 

DETAILED DESCRIPTION OF THE INVENTION 

The present invention provides novel Haemophilus Adhesion (HA) proteins In 
a preferred embodiment, the HA proteins are from Haemophilus, and in the 
preferrede m bodi m ent.from//^M// W ,/^ra. In particular. H. influen-ae 
encapsulated type b stra.ns are used to clone the HA proteins of the invention 
However.using the techniques outlined below.HA proteins from o^r Haemophilus 
influenzae strains, or from other bacterial species such as Neisseria spp. or 
Bordetalla spp. may also be obtained. 

Three HA proteins. HA I. HA2 and HA3. are depicted in Figures 2. 3 and 15 
respectively. HA2 is associated with the formation of surface fibrils, which are 
involved in adhesion to various host cells. HA1 has also been implicated in adhesion 
to a similar set of host cells. When the HA 1 or HA2 nucle.c acid is expressed ,n 
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• of c c/Ucoutrctheabtltty 
an on.adher.n,stram°fEc || should be noted that in Ihe literature. HA1 ts 
l0 adhere to human host cells. It should ^ ^ hrf 

referr ed to as hia (K *~ — — > ^ 

(Ho«n^W tesurfacefibrilS) ' 

. . iHAnucleicacidorHAprotein 
^.proteinntaybeidentirtedinseveralwa^ 

is in itia,,y identifed by — al »ucle,e ac, « 

nSS==^^ — ----- 

«inisa-HApro.ein-if*eove ra ..homologyof.hepro.ein 
"^""rrr ;seoulshown.n F i 6 ures^or Fi6 ure3and,or 
"*« " *° r " « - S^morepreferah^reaterthan 

FigurelS is preferably greater th ln some embodiments the 

aboul 65 ,„ and most preferably — ^" „ pr0Kin has a, 

of HA1. HA2 and HAS is cons such as the Best Fit sequence 

^^usingsu^-Hniques^tnmean- 

P^^^^-f;^ , I5 '^a.. 0 W „. The 
BLASTX program (Alischul et al.. • ■ 

be aligned. As 

noredbelow-inthecompartsonofprote-o f(hesHomr 
HA3w,m HA2.the homology isde^rmmedonthebastso 



sequence. 



25 



„• t aHAproteinisdefmed as having secant homology 
ln a preferred embodiment, a ^ terminal region, or both, of the H A 1 . H A2 

andHA3proieinsdepictedin Figures . horn ology). 
SOammoacidsisvmuallyidenticalasbetweenHAlandH 
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and as between either HA1 or HA3 and HA2 is 74%. As shown in Figure 1 1 . the 
first 24 amino acids of the N-terminusof HA1 and HA2 has limited homology to 
several other proteins, but this homology is 50% or less. Thus, a HA protein may 
be defined as having homology to the N-terminal region of at least about 60%. 
5 preferably at least about 70%. and most preferably at least about 80%. with 
homology as high as 90 or 95% especially preferred. Similarly, the C-terminal 
region of at least about 75, preferably 100 and most preferably 125 amino acid 
residues is also highly homologous and can be used to identify a HA protein. As 
shown in Figure 16. the homology between the C-terminal 120 or so amino acids 
10 of HA 1 and HA3 is about 98%. and as between either HA1 or HA3 and HA2 is 
also about 98%. Thus homology at the C-terminus is a particularly useful way of 
identifying a HA protein. Accordingly, a HA protein can be defined as having 
homology to the C-terminal region of at least about 60%, preferably at least about 
70%. and most preferably at least about 80%. with homology as high as 90 or 95% 
15 especially preferred. In a preferred embodiment, the HA protein has homology 
to both the N- and C-terminal regions. 

In addition, a HA protein may be identified as containing at least one stretch of 
amino acid homology found at least in the HA I and HA2 proteins as depicted in 
Figure 4. HA2 contains three separate stretchs of amino acids ( 1 74 to 608. 847 
20 to 1291. and 1476 to 1914. respectively) that shows significant homology to the 
region of HA1 defined by amino acids 221 to 658. 

The HA proteins of the present invention have limited homology to the high 
molecular weight protein-] (HMW1 ) of H. influenzae, as well as the AIDA-I adhesin 
of £. coli. For the HMW1 protein, this homology is greatest between residues 60- 
25 540 of the HA1 protein and residues 1 100 to about 1550 of HMW1, with 20% 
homology in this overlap region. For the AIDA-I protein, there is a roughly 50% 
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homology between the first 30 amino acids of AIDA-1 and HAl, and the overall 
homology between the proteins is roughly 22%. 

In addition, the HAl. HA2 and HA3 proteins of the present invention have 
homology to each other, as shown in Figures 4. 5 and 16. As between HAl and 
5 HAl the homology is 81% similarity and 72% identity overall. HA3 and HAl 
are51%identicaland65%similar. Thus, forthe purposes of the invention. HAl. 
HA2 and HA3 are all HA proteins. 

An "HAl" protein is defined by substantial homology to the sequence shown in 
Figure 2. This homology is preferably greater than about 60%. more preferably 
10 greater than about 70% and most preferably greater than 80%. In preferred 
ernbodimentsthe homology will be as high as about 90 to 95 or 98%. Similarly, 
an "HA2" protein may be defined by the same substantial homology to the sequence 
shown in Figure 3. and a "HA3" protein is defined with reference to Figure 1 5. as 
defined above. 



15 



20 



lnaddition.forsequenceswhichcontaineithermore or fewer amino ac.dsthan the 
proteins shown in Figures 2. 3 and 15. it is understood that the percentage of 
homology will be determined based on the number of homologous amino acids 
in relation to the total number of amino acids. Thus, for example, homology of 
sequences shorter than that shown in Figures 2. 3 and 15. as discussed below, will 
be determined using the number of amino acids in the shorter sequence. 



HA proteins of the present invention may be shorter than the amino acid sequences 
shown in Figures 2. 3 and 1 5. Thus, in a preferred embodiment, included within 
the definition of HA proteins are portions or fragments of the sequence shown in 
Figures 2. 3 and 1 5. Generally, the HA protein fragments may range in size from 
? 5 about 7 amino acids to about 800 amino acids, with from about 1 5 to about 700 
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amino acids being preferred, and from about 100 to about 650 amino acids also 
preferred. Particularly preferred fragments are sequences unique to HA; these 
sequences have particular use in cloning HA proteins from other organisms, to 
generate antibodies specific to HA proteins, or for particular use as a vaccine. 
Unique sequences are easily identified by those skilled in the art after examination 
of the HA protein sequence and comparison to other proteins; for example, by 
examination of the sequence alignment shown in Figures 5 and 1 6. Preferred unique 
sequencesincludetheN-terminalregionoftheHAl, HA2 and HAS sequences, 
comprising roughly 50 amino acids and the C-terminal 120 amino acids, depicted 
in Figures 2, 3 and 15. HA protein fragments which are included within the 
defmitionof a HA protein include N- or C-terminal truncationsand deletions which 
still allow the protein to be biologically active: for example, which still allow 
adherence, as described below. In addition, when the HA protein is to be used to 
generate antibodies, for example as a vaccine, the HA protein must share at least 
1 5 one epitope or determinant with the sequences shown in Figures 2, 3 and 1 5. In 
a preferred embodiment, the epitope is unique to the HA protein; that is. antibodies 
generated to a unique epitope exhibit little or no cross-reactivity with other proteins. 
However, cross reactivity with other proteins does not preclude such epitopes or 
antibodies for immunogenic or diagnostic uses. By "epitope" or "determinant" 
20 herein is meant a portion of a protein which will generate and/or bind an antibody. 

Thus, in most instances, antibodies made to a smaller HA protein will be able to 
bind to the full length protein. 

In some embodiments, the fragment of the HA protein used to generate antibodies 
are small; thus, they may be used as haptens and coupled to protein carriers to 
25 generate antibodies, as is known in the art. 



In addition, sequences longer than those shown in Figures 2. 3 and 15 are also 
included within the definition of HA proteins. 
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The HA proteins and nucleic acids of the present invention are preferably 
recombinant. As used herein, "nucleic acid" may refer to either DN A or RNA. or 
molecules which contain both deoxy- and ribonucleotides. The nucleic acids include 
genomic DNA ? cDNA and oligonucleotidesincluding sense and anti-sense nucleic 

5 acids. Specifically included within the definition of nucleic acid are anti-sense 
nucleic acids. An anti-sense nucleic acid will hybridize to the corresponding non- 
coding strand of the nucleic acid sequences shown in Figures L 3 and 14, but may 
contain ribonucleotides as well as deoxyribonucleotides. Generally, anti-sense 
nucleic acids function to prevent expression of mRNA, such that a HA protein is 

0 not made, or made at reduced levels. The nucleic acid may be double stranded, 
single stranded, or contain portions of both double stranded or single stranded 
sequence. By the term "recombinant nucleic acid" herein is meant nucleic acid, 
originally formed jp vitro by the manipulation of nucleic acid by endonucleases. 
in a form not normally found in nature. Thus an isolated HA protein gene, in a linear 

5 form, or an expression vector formed in vitro by ligating DNA molecules that are 
not normally joined, are both considered recombinant for the purposes of this 
invention: i.e. the HA nucleic acid is joined to other than the naturally occurring 
Haemophiluschromosomein which it is normally found. It is understood that once 
a recombinant nucleic acid is made and reintroduced into a host cell or organism. 

► it will replicate non-recombinantly.i.e. using the in vivo cellular machinery of the 
host cell rather than in vitro manipulations: however, such nucleic acids, once 
produced recombinantly. although subsequently replicated non-recombinantly.are 
still considered recombinant for the purposes of the invention. 

Similarly.a "recombinantprotein" is a protein made using recombinanttechniques. 
i.e. through the expression of a recombinant nucleic acid as depicted above. A 
recombinant protein is distinguished from naturally occurring protein by at least 
one or more characteristics. For example, the protein may be isolated away from 
some or all of the proteins and compounds with which it is normally associated 



WO 96/30519 



-16- 



PCT/US96/04031 



10 



in , ts wi,d type host, or found in the absence of the host cells themselves. Thus. 
productionofaHAprotdnfromrme organism inadifrerentorganismorhostcell. 

is normally seen, trough the use of a inducibie promoter or high expression 
promoter, such that the protein is made a. increased concentration .evel, 
AUernativelv.ntepro.einmaybetnaformno.normaHyfoundtanature.asinthe 

addition of an epitope tag or amino acid substitutions, insenions and deiettons. 
Furte nore.a>*oughnotnorn^^^ 

of proteins which are synthesized chemically, using the sequence informal of 
Figures 2. 3 and 1 5. are considered recombinant herein as well. 

Also included with the definition of HA protein are HA proteins from other 
organisms, which are cloned and expressed as outlined below. 

!„ the case of anti-sense nucleic acid, an antt-sense nucleic acid is defined as one 
l5 whtch wi., hvbridize to all or pan of the corresponding non-cod.ng sequence of 
the sequences shown in Figures 1.3and 14. General.;, the hybridization conditions 

.• „r ™.i .mis hybridization will be high stringency 
used for the determination of anti-sense nyorioizau 

conditions, such as O.IXSSC at 65"C. 

Oncethe HA protein nucleic acid is identified, it can be Coned and. if necessary. 
no i,scons.i.uen,pansrecombined.ofonn*eentireHApro.ein„ucleicac,d.Once 

isolated from to natura, source, e.g.. contained withtn a plasmid or other vector 
or excised,herefromasalinearnucleicacidsegmen,.herecombina»,HA protein 

n.Ceicacidcanbefurtherusedasaprobe.oiden.iftandiso.a.eotherHAprotem 
nucleic acids. 1. can alsobe used as a -precursor" nucleic acid ,0 make modified 
15 or variant HA protein nucleic acids and proteins. 
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Using the nucleic acids of the present invention which encode HA protein, a variety 
of expression vectors are made. The expression vectors may be either self- 
replicating extrachromosomal vectors or vectors which integrate into a host genome. 
Generally, these expression vectors include transcriptional and translational 
5 regulatory nucleic acid operably linked to the nucleic acid encoding the HA protein. 
"Operably linked" in this context means that the transcriptional and translational 
regulatory DNA is positioned relative to the coding sequence of the HA protein 
in such a manner that transcription is initiated. Generally, this will mean that the 
promoter and transcriptional initiation or start sequences are positioned 5* to the 

10 HA protein coding region. The transcriptionaland translational regulatory nucleic 
acid will generally be appropriate to the host cell used to express the HA protein: 
for example, transcriptional and translational regulatory nucleic acid sequences from 
Bacillus will be used to express the HA protein in Bacillus . Numerous types of 
appropriate expression vectors, and suitable regulatory sequences are known in the 

1 5 art for a variety of host cells. 

In general, the transcriptionaland translational regulatory sequences may include, 
but are not limited to, promoter sequences, leader or signal sequences, ribosomal 
binding sites, transcriptional start and stop sequences, translational start and stop 
sequences, and enhancer or activator sequences. In a preferred embodiment, the 
20 regulatory sequences include a promoter and transcriptional start and stop sequences. 

Promoter sequences encode either constitutive or inducible promoters. The 
promoters may be either naturally occurring promoters or hybrid promoters. Hybrid 
promoters, which combine elements of more than one promoter, are also known 
in the art. and are useful in the present invention. 

25 In addition, the expression vectormay comprise additional elements. Forexample. 

the expression vector may have two replication systems, thus allowing it to be 



WO 96/30519 



PCT/US96/04031 



-18- 

maintained in two organisms, for example in mammalian or insect cells for 
expression and in a procaryotic host for cloning and amplification. Furthermore, 
for integrating expression vectors, the expression vector contains at least one 
sequence homologous to the host cell genome, and preferably two homologous 
5 sequences which flank the expression construct. The integrating vector may be 
directed to a specific locus in the host cell by selecting the appropriatehomdlogous 
sequence for inclusion in the vector. Constructs for integrating vectors are well 
knoWn in the art. 

In addition, in a preferred embodiment, the expression vector contains a selectable 
1 0 marker gene to allow the selection of transformed host cells. Selection genes are 
well known in the art and will vary with the host cell used. 

The HA proteins of the present invention are produced by culturing a host cell 
transformed with an expression vector containing nucleic acid encoding a HA 
protein, under the appropriate conditions to induce or cause expression of the HA 

1 5 protein. The conditions appropriate for HA protein expression will vary with the 
choice of the expression vector and the host cell, and will be easily ascertained by 
one skilled in the art through routine experimentation. For example, the use of 
constitutive promoters in the expression vector will require optimizing the growth 
and proliferation of the host cell, while the use of an inducible promoter requires 

20 the appropriate growth conditions for induction. In addition, in some embodiments. 

the timing of the harvest is important. For example, the baculoviral systems used 
in insect cell expression are lytic viruses, and thus harvest time selection can be 
crucial for product yield. 

Appropriate host cells include yeast, bacteria, archebacterta. fungi, and insect and 
25 animal cells, including mammalian cells. Of particular interest are DrQsophila 
melangaster cells, Saccharomvces cerevisiae and other yeasts, E-.cqIL Bacillus 
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subtilis , SF9 cells. C 1 29 cells, 293 cells, Neurospora.BHK. CHO. COS. and HeLa 
cells, immortalized mammalian myeloid and lymphoid cell lines. 

In a preferred embodiment. HA proteins are expressed in bacterial systems. 
Bacterial expression systems are well known in the art. 

5 A suitable bacterial promoter is any nucleic acid sequence capable of binding 
bacterial RNA polymerase and initiating the downstream (3') transcription of the 
coding sequence of HA protein into mRNA. A bacterial promoter has a transcription 
initiation region which is usually placed proximal to the 5' end of the coding 
sequence. This transcription initiation region typically includes an RNA polymerase 

10 binding site and a transcription initiation site. Sequences encoding metabolic 
pathway enzymes provide particularly useful promoter sequences. Examples include 
promoter sequences derived from sugar metabolizing enzymes, such as galactose, 
lactose and maltose, and sequences derived from biosynthetic enzymes such as 
tryptophan. Promoters from bacteriophage may also be used and are known in the 

15 art. In addition, synthetic promoters and hybrid promoters are also useful; for 
example, the tac promoter is a hybrid of the trp and lac promoter sequences. 
Furthermore, a bacterial promoter can include naturally occurring promoters of non- 
bacterial origin that have the ability to bind bacterial RNA polymerase and initiate 
transcription. 

20 

In addition to a functioning promoter sequence, an efficient ribosome binding site 
is desirable. In £. colL the ribosome binding site is called the Shine-Delgarno(SD) 
sequence and includes an initiation codon and a sequence 3-9 nucleotides in length 
located 3-11 nucleotides upstream of the initiation codon. 

25 The expression vector may also include a signal peptide sequence that provides 
for secretion of the HA protein in bacteria. The signal sequence typically encodes 
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10 and leucine biosynthelic pathways. 

Thesecomponents^assembledtntoexpressionvectors. Expression vectors for 

S, re ,«,co«M cremom. and S,rep,ococcu S livldans. among others. 

The bacterial expression vectors are transformed into bacteria, host ce.is using 
, 5 techniques we., know, in the ar, such as ca,cium chloride treatment. 

electroporation. and others. 

„ „„= embodiment.^ protetnsare producedin insectceiis. Express™ vectors 
f„ rA e«ansfonnauonofinsec,ce..,andinpa rt cu,ar.bacu.ov, m Wexpress,o„ 

vec.ors.are wellknowntn the art. Briefly. baculovirus is a very Urge DNA vtrus 
wh.chpnx.ueesiucoatprote.natveryWBhlevel, Due, o thesize of the baculovtra. 
genomcexogenousgenesmus, be placed in the viral genome by recombinatto, 
Accord,ng.v.the component the expression system include: a transfer vector. 
usuaHv a bacteria, p.asmtd. which contains both a fragment of the bacu.ov.rus 
genome, and a conventen, restrict sue fo, insenton of the HA protein: a w„d 
, ype baculov,ruswi m ase,uen=eho m o,o g ous,o,hebacu.ov,ru,spec,f,cfragme« 
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in the transfer vector (this allows for the homologous recombination of the 
heterologous gene into the baculo virus genome); and appropriate insect host cells 
and growth media. 



one 



Mammalian expression systems are also known in the art and are used in 
embodiment. A mammalian promoter is any DNA sequence capable of binding 
mammalian RNA polymerase and initiating the dawnstream (3') transcription of 
a coding sequence for HA protein into mRNA. A promoter will have a transcription 
initiating region, which is usually place proximal to the 5' end of the coding 
sequence, and a TATA box. using a located 25-30 base pairs upstream of the 
transcription initiation site. The TATA box is thought to direct RNA polymerase 
II to begin RNA synthesis at the correct site. A mammalian promoter will also 
contain an upstream promoter element, typically located within 100 to 200 base 
pairs upstream of the TATA box. An upstream promoter element determines the 
rateatwhichtranscriptionisinitiatedandcanactineitherorientation. Ofparticular 
use as mammalian promotersare the promoters from mammalian viral genes, since 
the viral genes are often highly expressed and have a broad host range. Examples 
include the SV40 early promoter, mouse mammary tumor virus LTR promoter, 
adenovirus major late promoter, and herpes simplex virus promoter. 

Typically.transcriptionterrhinationand polyadenylationsequences recognized by 
mammalian cells are regulatory regions located 3' to the translation stop codon and 
thus, together with the promoter elements, flank the coding sequence. The 3' 

terminusofthematuremRNAisformedbysite-specificpost-translationalcIeavage 
and polyadenylation. Examples of transcription terminator and polyadenlytion 
signals include those derived form SV40. 

The methodsof introducingexogenousnucleicacid into mammalian hosts, as well 
as other hosts, is well known in the art. and will van with the host cell used. 
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protein. using cassette mutagenesis or other techniques well known in the art, to 
produce DNA encoding the variant, and thereafter expressing the DNA in 
recombinant cell culture as outlined above. However, variant HA protein fragments 
having up to about 100-1 50 residues may be prepared by in vitro synthesis using 
5 established techniques. Amino acid sequence variants are characterized by the 
predetermined nature of the variation, a feature that sets them apart from naturally 
occurring allelic or interspecies variation of the HA protein amino acid sequence. 
The variants typically exhibit the same qualitative biological activity as the naturally 
occurring analogue, although variants can also be selected which have modified 
1 0 characteristics as will be more fully outlined below . 

While the site or region for introducing an amino acid sequence variation is 
predetermined, the mutation per se need not be predetermined. For example, in 
order to optimize the performance of a mutation at a given site, random mutagenesis 
may be conducted at the target codon or region and the expressed HA protein 

1 5 variants screened for the optimal combination of desired activity. Techniques for 
making substitution mutations at predetermined sites in DNA having a known 
sequence are well known, for example. M 1 3 primer mutagenesis. Screening of the 
mutants is done using assays of HA protein activities; for example, mutated HA 
genes are placed in HA deletion strains and tested for HA activity, as disclosed 

20 herein. The creation of deletion strains, given a gene sequence, is known in the 
art. For example, nucleic acid encoding the variants may be expressed in an 
adhesion deficient strain, and the adhesion and infectivity of the variant 
Haemophilus influenzae evaluated. For example, as outlined below, the variants 
may be expressed in the E. coli DH5a non-adherent strain, and the transformed 

25 £. coli strain evaluated for adherence using Chang conjunctival cells. 

Amino acid substitutions are typically of single residues; insertions usually will 
be on the order of from about 1 to 20 amino acids, although considerably larger 
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Substantial changes in function or immunological identity are made by selecting 
substitutions that are less conservative than those shown in Chart I. For example, 
substitutions may be made which more significantly affect: the structure of the 
polypeptide backbone in the area of the alteration, for example the alpha-helical 
5 or beta-sheet structure; the charge or hydrophobicity of the molecule at the target 
site; or the bulk of the side chain. The substitutions which in general are expected 
to produce the greatest changes in the polypeptide's properties are those in which 
(a) a hydrophilic residue, e.g. seryl or threonyl, is substituted for (or by) a 
hydrophobic residue, e.g. leucyl, isoleucyl, phenylalanyl, valyl or alanyl; (b) a 
1 0 cysteine or proline is substituted for (or by) any other residue; (c) a residue having 
an electropositive side chain, e.g. lysyl. argtnyl, or histidyl, is substituted for (or 
by) an electronegative residue, e.g. glutamyl or aspartyl; or (d) a residue having 
a bulky side chain, e.g. phenylalanine, is substituted for (or by) one not having a 
side chain, e.g. glycine. 

1 5 The variants typically exhibit the same qualitative biological activity and will elicit 
the same immune response as the naturally-occurringanalogue. although variants 
also are selected to modify the characteristics of the polypeptide as needed. 
Alternatively, the variant may be designed such that the biological activity of the 
HA protein is altered. For example, the Walker box ATP-binding motif may be 

20 altered or eliminated. 

In a preferred embodiment, the HA protein is purified or isolated after expression 
HA proteins may be isolated or purified in a variety of ways known to those skilled 
in the art depending on what other components are present in the sample. Standard 
purification methods include electrophoretic. molecular, immunological and 
25 chromatographic techniques, including ion exchange, hydrophobic, affinity, and 
reverse-phase HPLC chromatography, and chromatofocusing. For example, the 
HA protein may be purified using a standard anti-HA antibody column. 
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Antibodies generated to HA proteins may also be used in passive immunization 
treatments, as is known in the art. 

Antibodies generated to unique sequences of HA proteins may also be used to screen 
expression libraries from other organisms to find, and subsequently clone. HA 
nucleic acids from other organisms. 

In one embodiment, the antibodies may be directly or indirectly labelled. By 
"labelled" herein is meant a compound that has at least one element, isotope or 

chemicalcompoundanachedtoenablethedetectionofthecompound. IngeneraL 
labels fall into three classes: a) isotopic labels, which may be radioactive or heavy- 
isotopes; b) immune labels, which may be antibodies or antigens: and c) colored 
or fluorescent dyes. The labels may be incorporated into the compound at any 
position. Thus, for example, the HA protein antibody may be labelled for detection, 
or a secondary antibody to the HA protein antibody may be created and labelled. 

In one embodiment, the antibodies generated to the HA proteins of the present 
inventionareusedtopurifvorseparateHAprotd^^ • 
organism from a sample. Thus for example, antibodies generated to HA proteins ' 
which will bind to the Haemophilus influenzae organism may be coupled, us.ng 
standard technology, to affinity chromatography columns. Thesecolumns can 
used to pull out the Haemophilus organism from environmental or tissue samples. 

In a preferred embodiment, the HA proteins of the present invention are used as 
vaccinesfortheprophylacticortherapeut^^ 

infection in a patient. By "vaccine" or "immunogenic compositions" herein is meant 
an antigen or compound which elicits an immune response in an animal or pauent. 
The vaccine may be admin.stered prophylacticall v. for example to a patient never 
previously exposed to the antigen, such that subsequent infection bv the 
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Haemophilus influenzae organism is prevented. Alternatively, the vaccine may 
be administeredtherapeuticallyto a patient previously exposed or infected by the 
Haemophilus influenzae organism. While infection cannot be prevented, in this 
case an immune responseis generated which allows the patient's immune system 
to more effectively combat the infection. Thus, for example, there may be a decrease 
or lessening of the symptoms associated with infection. 

A "patient" for the purposes of the present invention includes both humans and other 
animalsand organisms. Thus the methods are applicable to both human therapy . 
and veterinary applications. 

The administration of the HA protein as a vaccine is done in a variety of ways. 
Generally the HA proteins can be formulated according to known methods to 
prepare pharmaceuticallyuseful compositions, whereby therapeutically effective 
amounts of the HA protein are combined in admixture with a pharmaceutical 
acceptablecarriervehicle. Suitable vehicles and their formulation are well known 
in the art. Such compositions will contain an effective amount of the HA protein 
together with a suitable amount of vehicle in order to prepare pharmaceutica.lv 
acceptablecompositionsforeffectiveadministrationtothehost. The composite 
mav include salts, buffers, carrier proteins such as serum albumin, targeting 
molecules to localize the HA protein at the appropriate site or tissue withm the 
organism.and other molecules. The composition may include adjuvants as well. 

In one embodiment, the vaccine is administered as a single dose; that is. one dose 
is adequate to induce a sufficient immune response to prophylacticallv or 
therapeuticallvtreata/^^ 

the vaccine is administered as several doses over a period of time, as a pnmary 
vaccination and "booster M vaccinations. 
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By "therapeutically effective amounts" herein is meant an amount of the HA protein 
which is sufficient to induce an immune response. This amount may be different 
depending on whether prophylactic or therapeutic treatment is desired. Generally, 
this ranges from about 0.001 mg to about 1 gm, with a preferred range of about 0.05 
5 to about .5 gm. These amounts may be adjusted if adjuvants are used. 

The following examples serve to more fully describe the manner of using the above- 
described invention, as well as to set forth the best modes contemplated for carrying 
out various aspects of the invention. It is understood that these examples in no way 
serve to limit the true scope of this invention, but rather are presented for illustrative 
10 purposes. All references cited herein are specifically incorporated by reference. 

EXAMPLE I 
Cloning of HA 1 

Many protocolsare substantially the same as those outlined in St. Geme et aL Mol. 
Microbio. 15(l):77-85 (1995). 

15 Bacterial strains, nl asm i ds, and phag es. 

Nontypable//. influenzae strain 1 1 was the clinical isolate chosen as a prototypic 
HM W 1 /HM W2-non-expressingstranu although a variety of encapsulated ty pable 
strains can be used to clone the protein using the sequences of the figures. The 
organism was isolated in pure culture from the middle ear fluid of a child with acute 
20 otitis media. The strain was identified as H. influenzae by standard methods and 
was classified as nontypable by its failure to agglutinate with a panel of typing 
antisera for H. influenzaetypes a to f (Burroughs Wellcome Co.. Research Triangle 
Park. N.C.) and failure to show lines of precipitation with these antisera in 
counterimmunoelectrophoresis assays. Strain 1 1 adheres efficiently to Chang 
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c0 „ i<m c,iva,ce„s * a, ,eve,s compare - *« f*"* "< 
ta NTH, strains expressing HMWl/HMWMk. P™ins (data no, show,). 
Convalescent serum from the chi.d infected with this s«a,n demonstrated - 

5 weigh, proteins with molecular weights greater *an 100 kDa. 

Mass)p T7.7wa S th t Wndg i ftofSun 1 eyTabo,.Thisv«ta,contamsd ie T7RNA 
for the T7 gene 10 protein upstream from a multiple clontng sue. 

, 0 MrlrnHr ^Wwiim. 

THerecombinantpnageconuainingtJteH.^ gene was isotated and charac,=ri.d 
usmgmethodssimnar.o.hosedescriWprevious.y.lnbnef.ehromosomalDNA 

prepared and fractionated on 0.7% agarose gels. Fractions contamtng DNA 
, 5 Laments in me «. to 20- kbp range were pooled, and a library - prepared b, 
l,ga„o„ .nto XEMBL3 arm, Ligation matures were packaged „, ^ 
Gtgapack (Stratagene) and plate-amplified in a P2 lysogen of £. - U*- 

, 0 Press For plasmid subcloning studies. DNA from recombinant phage was 
20 , \a nT7 7 Standard methods were used for 

subcloned into the T7 expression plasmid P T7-7. Standa 

manipulate of cloned DNA as described by Maniatis et al (supra). 

,„a hv udatine an 1 1 kbp Xbal fragment from 
Plasmid pHMWS-3 was generated b> isolating an 

pled^A^— tpbageclonell-nandligatmgtntoXb c„, p ,. 
2J PlasmidpHMWM was generated by isolattng a ,0 kbp BamH\-Cia\ cu, pT7,. 
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Plasmid pHMW8-5 was generated by digesting plasmid pHMW8-3 DNA with CM 
isolating the larger fragment and religating. Plasmid pHMW8-6 was generated by 
digestingpHMW8-4 with Spel which cuts at a unique site within the HA J gene, 
blunt-ending the resulting fragment, inserting a kanamycin resistancecasserte into 
theSpelsite. Plasmid pHMW8-7 was generated by digesting pHMW8-3 withArwl 
and Hindll isolating the fragment containing P T7-7. blunt-ending and religating. 
The plasmid restriction maps are shown in Figure 6. 

DNA sequ ence analys is 

DNA sequence analysis was performed by the dideoxy method with the U.S. 
Biochemicals Sequenase kit as suggested by the manufacturer. [*S]dATP was 
purchased from New England Nuclear (Boston, Mass). Data were analyzed with 
Compugene software and the Genetics Computer Group program from the 
University of Wisconsin on a Digital VAX 8530 computer. Several 21-mer 
oligonucleotide primers were generated as necessary to complete the sequence. 

15 Adherenc e assays. 
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Adherence assays were done with Chang epithelial cells [Wong-Kilboume 
derivative, clone 1 -5c-4 (human conjunctiva). ATCC CCL20.2)]. which were seeded 
into wells of 24-well tissue culture plates, as described (St. Geme III et al.. Infect. 
Immun. 58:4036 ( 1 990)). Bacteria were inoculated into broth and allowed to grow 
to a density of approximately 2x10' colony-forming units per ml. Approximately 
2 x 1 0 7 colony-forming units were inoculated onto epithelial cells monolayers, and 
plates were gently centrifuged at 165 x g for 5 min to facilitate contact between 
bacteriaand the epithelial surface. After incubation for 30 min at 3 7°C in 5% CO,, 
monolayers were rinsed five times with phosphate buffered saline (PBS ) to remove 
nonadherent organisms and were treated with trypsin-EDTA (0.05% trypsin/0.5% 
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EDTA) in PBS to release them from the plastic support. Well contents were 
agitated,and dilution^ 

bacteriapermonolayer.Percentadherencewascalculatedbydividingthenumber 

ofadherentcolony-forrr^ 

5 forming units. 

The nontypableHaemo^fn^aestrain 1 1 chromosomal DN A library was 
screened immunologically with convalescent serum from the child infected w lt h 
l0 strain 1 1 . Immunoreactive clones were screened by Western blot for express.on 
ofhighmolecularweightproteinswitha PP arentmolecularweights>100dDaand 

two different classes of recombinant clones were recovered. A single clone 
designated 11-17 was recovered which expressed the HA 1 protein. The recombinant 
protein expressed by this clone had an apparent molecular weight of greater than 
15 200 kDa. 

-reformation into E- CPU 

Plasmidswere introduced into DH5a strain of E. coli (Maniatis. supra), which is 
anon-ao^erentstrair.usingelectroporation (Dower etal.,Nucl. Acids Res. 16:6127 
( 1 988). The results are shown in Table 1 . 
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Table 1 



Strain 


% Adherence" 


DH5a(pHMW 8-4) 


43.3 ± 5.0% 


DH5a(pHMW 8-5) 


41.3 ±3.3% 


DH5a(pHMW 8-6) 


0.6 ± 0.3% 


DH5a(pHMW 8-7) 


• 


DH5a(pT7-7) 


0.4 ±0.1% 


Adherence was measured in a 30 minute assay and was calculated by dividing the 
number of adherent bacteria by the number of inoculated bacteria. Values are the 



mean ± SEM of measurements made in triplicate from a representativeexperimem 

In addition, a monoclonal antibody made by standard procedures, directed against 
the strain 1 1 protein recognized proteins in 57 of 60 epidemiologically-unrelated 
NTHI. However. Southern analysis using the gene indicated that roughly only 25% 
of the tested strains actually hybridized to the gene (data not shown). 

EXAMPLE 2 
Cloning of HA2 

In a recent study we examined a series of H. influenza type b isolates by 
transmission electron microscopy and visualized short, thin surface fibrils distinct 
from pili (St. Geme. J.W.III. and D. Cutter. 1995. Evidence that surface fibrils 
expressed by Haemophilus influenzae type b promote attachment to human epithelial 
cells. Mol. Microbiol. 15:77-85.). In that study, the large genetic locus involved 
in the expression of these appendages was isolated. 

Bacteria] strains and plasmids 
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„ « s«a,n C54 is a .ype b SKain *a. has been described prev.ously 

C54-T„400 23is a muun.ta —a mini-Tn/0 km elemen. in teflocus 

and demons.ra.es — -r. adhere (S, Geme. 3.WJ11. and D. G~ 

,9,5 Evidence .ha. surface fibrils exposed by te^ *&"""» W 
prom o, e anach ra =n..ohu ro a„ e pHhe 1 U l ceU,Mo 1 .M fcro b i oU5 : 77-S 5 ,.S^ 

,053 1058 , 0 M ..O63. 1 06 5 . 1 069,1070.1076.,08,. M d.0 8 4a re H.^^ 
typ e b isola.es generous* provided by J. Musse, (Bay,o, Universny. Houston. 
TexasHMussere.au. 1990. Global generic suture and molecular epidem.o.ogy 
of enca P sula,ed Haemophilus »nfl^. Rev. Infec, Di, .2-75-111.). « 
,« smta SM4 (, yP e a.. SM6 „y P = * SM7 WP e e, and SM72 <,yp= « 
« ,ype stains ob,ain=d from K. Faclclam a, ft. Ce» K rs for Disease Con„o 

s, ra ins 134. 2,9. 2* and 501 are H. «»«r W f ,sola,es„buu,ed from a 
K a vh,v( F ,nnishNauonalPub.uHea,u 1 .ns.Uu.e.Hel^,,.S™nRd(.yped,an 

«,e 1 5 non.vpable isola.es examined by Sou.bem ana»s,s have been tooted 

. . n \a„a sn 145-359 (1951): Barencampetal.. 
previously- (Alexander et al.. J. Exp. Med. 83.345 ( 

, , Anno-' 1313(1992)). £.ca//DH5o is a nonadherent laboratory 

Infect, lmmun. 60. 1 ->Ui- ^ ' yy ->> 

slra i nlh a. W asori B ina„yob,, n edf.o m0 ,bc„B R X.,c o( ,s,ramBL:,<D E ^a., 
aginnomF.W.Sn^.er^con^a^.ecopyofu.cTlRNApolyn^Scne 

* Use of baceriophage T7 RNA polymerase ,o direc. high-level express™ 

of Coned gene, I. Mo.. B,ol. .89, ,3-1 30., P.asmrd pT7-7 was prov.ded by S. 
Taborandco„..ns.heT7R N A P olymerasepromo, r n0..nbosome-b,„d,ngs,.. 

an d,be,rans,a t io n a 1 s.a n si,efor.beT7 8 ene.0pro,ei„ups U eamf.mam^uUm te 
cloning sue (Tabor. S.. and C£. RW-~ ,985. A baceriophage T7 RNA 
po.mls.promo.ersys^ 

Proc Nail. Acad. Sci. USA. 82: 1074-, 078.). pUC 1 9 is a hich-cop> -number plasmid 
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that has been previously described (Yanish-Perronet aL Gene 33:103-1 19(1 985)). 
pDC400 is a pUCl 9 derivative that harbors the H influenzae strain C54 surface 
fibril locus and is sufficient to promote in vitro adherence by laboratory strains of 
£. coli (St. Geme. J.W.IIL and D. Cutter. 1995. Evidence that surface fibrils 
5 expressed by Haemophilus influenzae type b promote attachment to human epithelial 
cells. Mol. Microbiol. 15:77-85.). pHMW8-5 is a pT7-7 derivative that contains 
the H influenzae strain 1 1 hia locus and also promotes adherence by nonadherent 
laboratory strains of £. coli (Barenkamp. S.J..and J. W. St. Geme, III. Identification 
of a second family of high molecular weight adhesion proteins expressed by 

10 nontypable Haemophilus influenzae. Mol. Microbiol., in press.). pHMW8-6 
contains the H influenzae hia locus interrupted by a kanamycin cassette 
(Barenkamp. S J. s and J. W. St. Geme. III. Identificationof a second family of high 
molecular weight adhesion proteins expressed by nontypable Haemophilus 
influenzae. Mol. Microbiol., in press.). pUC4K served as the source of the 

1 5 kanamycin-resistancegene that was used as a probe in Southern analysis (Vieira. 
J., and J. Messing. 1982. The pUC plasmids. an M13mp7-derived system for 
insertion mutagenesis and sequencing with synthetic universal primers. Gene. 
19:259-268.). 

Culture conditions 

20 H influenzae strains were grown on chocolate agar supplemented with 1% Isovitale 
X. on brain heart infusion agar supplemented with hemin and NAD (BHI-DB agar), 
or in brain heart infusion broth supplemented with hemin and NAD (BHls) 
(Anderson. P.. R.B. JohnstonJr.. and D.H. Smith. 1972. Human serum activity 
against Haemophilus influenzae type b. J. Clin. Invest. 51:31-38.). These strains 

25 were stored at -80°C in brain heart infusion broth with 25% glycerol. £. coli strains 
were grown on Luria Bertani (LB) agar or in LB broth and were stored at -80 C C 
in LB broth with 50% glycerol. For H influenzae, kanamycin was used in a 
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following: an.pici.Hn or carbenicil.in .00 mg/m. and tanamycn 50 mg/m>. 
mdudion of pl»mid-«ocoded proteins 

was emp,oyed and the re,eva„, pT7-7 derives were .-formed .mo £ 

1 (DE3). Activation of the T7 promoter was achieved by inducing ^^'^ 
^ RNA polymerase with isopropyi-b-D-t^aiactop— <f,na 

,„ a final concentration of 200 mg/ml. Thiny minu.es later. . m, of culture was 

Ire harvest and whole ce„ lysates were .suspended in Uentml, or 
analysisby sodium dodecy. su.fa,e-po.yacrylamide ge, elecrophores,s on 7.5/. 

w, ™H UK 1970 Cleavage of structural proteins dunngthe 
ac rylamidegels(Uemmli.U.K. 1970. L.ea g 

assembly o, the head of bacenophage T4. Na,ure .London,. 

Autography was performed with Kodak XAR-5 film. 
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Recombinant DNA methods 

"formed accord,n,.o standard ,ech„i,ues ( San*roo, ,. E,. Fn.sch.dr 
Ubora,ory. Cold Spnng Harbor. N,, Plasm.ds were in.rc.uced, „ ^ 
; lf Mil.e,.andC.W.Ka g sda,,,«.Higheff,ciency,ra„s fom ,.onof, 

, E p Lh. and T. Mania,, .989. Molecular clomng: a laboratory manual. 
. ;„ ,„ /I „ enM ewasperformedu,»g.heM,Vme.l,odofH=mo„e,a,,He m o l , 
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R.M.. E M. Meyer, and M. Vogt. 1970. Defined nongrowth media for stage II 
competence in Haemophilus influenzae. J. Bacteriol. 101:517-524.). 

Adherence assays 

Adherence assays were performed with tissue culture cells which were seeded into 
5 wells of 24-well tissue culture plates as previously described (St. Geme et aL Infect. 
Immun. 58:4036-4044 ( 1 991 )). Adherence was measured after incubating bacteria 
with epithelial monolayers for 30 minutes as described (St. Geme, J.W.III, S. 
Falkow. and S.J. Barenkamp. 1 993. High-molecular-weightproteinsof nontypable 
Haemophilus influenzae mediate attachment to human epithelial cells. Proc. Natl. 

10 Acad. Sci. U.S.A. 90:2875-2879.). Tissue culture cells included Chang epithelial 
cells ( Wong-Kilbournederi vati ve.clone 1 -5c-4 (human conjunct! va))( ATCC CCL 
20.2), KB cells (human oral epidermoid carcinoma) (ATCC CCL 1 7), HEp-2 cells 
(human laryngeal epidermoid carcinoma) (ATCC CCL 23), A549 cells (human 
lung carcinoma) (ATCC CCL 1 85), Intestine 407 cells (human embryonic intestine) 

15 (ATCC CCL 6), HeLa cells (human cervical epitheloid carcinoma) (ATCC CCL 
2). ME-1 80 cells (human cervical epidermoid carcinoma) (ATCC HTB 33), HEC-IB 
cells (human endometrium) (ATCC HTB 1 13). and CHO-K1 cells (Chinese hamster 
ovary) (ATCC CCL 61 ). Chang. KB. Intestine 407, HeLa. and HEC-IB cells were 
maintained in modified Eagle medium with Earle's salts and non-essential amino 

20 acids. HEp-2 cells were maintained in Dulbecco's modified Eagle medium. A549 
cells and CHO-K1 cells in F12 medium (Ham), and ME- 180 cells in McCoy5A 
medium. All media were supplemented with 10% heat-inactivated fetal bovine 
serum. 

Southern analysis 

25 Southern blotting was performed using high stringency conditions as previously 
described (St. Geme. J.W.III. and S. Falkow. 1991. Loss of capsule expression by 
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Haemophilusinfluenzae type b results in enhanced adherence to and invasion of 
human cells. Infect. Immun. 59:1325-1333.). 

Microscopy 

Samplesofepithelialcellswith associated bacteria were stained with Giemsa stain 
and exammedby light microscopy as described (St. Geme. J.W.III. and S. Falkow. 
S. 1990. Haemophilus influenzae adheres to and enters cultured human epithelial 
cells. Infect. Immun. 58:4036-4044.). 

For negative-staining electron microscopy, bacteria were stained with 0.5% aqueous 
uranvl a cetate(St.Geme.J.W.III.andS.Falkow.l991.Lossofcapsuleexpressicn 

bv Haemophilus influenzae type b results in enhanced adherence to and invasion 
of human cells. Infect. Immun. 59:1325-1333.) and examined using a Zeiss 10A 

microscope. 

The previous study indicated that laboratory E. coli strains harboring the plasmid 
pDC400 were capable of efficient attachment to cultured human epithelial cells 
(St. Geme. J.W.11I. and D. Cutter. 1995. Evidence that surface fibrils expressed 
by Haemophilus influenzae^ b promote attachment to human epithelial cells. 
Mol. Microbiol. 15:77-85.). Subcloning studies and transposon mutagenesis 
indicated that the relevant coding region of P DC400 was present within an 8.3 kb 
AM fragment^. Geme. J.W.III.and D. Cutter. 1 995. Ev.dencethat surface fibrils 
expressed by Haemophilus influenzae type b promote attachment to human epithelial 
cells. Mol. Microbiol. 15:77-85.) (Figure 7). To confirm this conclus.on. in the 
present studv this Xbal fragment was subcloned into P T7-7, generating plasmids 
designated P DC60 1 and P DC602. whkh contained the insert in opposite orientations 
(Figure 7). As predicted, expression of these plasmids in E. coli DH5a was 
associated with a capacity for high level in varo attachment (Table 1 ). 
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Table 1. Adherence to Chang conjunctival cells. 

Saain adherens ™ r \'\vmY 

DH5o/pT7-7 0.4 + 0.1 

DH5a/pDC400 25.3 + 1.2 

5 DH5a/pDC601 54.3 + 7.5 

DH5a/pDC602 55^5 ±4.3 

C54bp- 98.7 + 9.5 

C54-HAl::kan b 1.5 + 0.2 

C54-Tn400.23 c 3.3 + 0.4 
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•Adherence was measured in a 30 minute assay and was calculated by dividing the 
number of adherent bacteria by the number of inoculated bacteria. Values are the 
mean + SEM of measurements made in triplicate from representative experiments 
"Strain C54-HAl::kan was constructed by transforming C54bp with linearized 
pHMWg-6. which contains the HA I gene with an intragenic kanamycin cassette. 
'Strain C54-Tn400.23 containsa mini-Tn/0 kan element in the fo/locus (St. Geme 
et al.. Mol. Microbiol. 15:77-85 (1995)). 



To determine the directionof transcription and identify plasmid-encoded proteins. 
pDC601 and pDC602 were subsequently introduced into E. coli BL21(DE3). 
producing BL21(DE3)/pDC601 and BL2 1 (DE3)/pDC602, respectively. As a 
negative control, pT7-7 was also transformed into BL2 1 (DE3). The T7 promoter 
in these three strains was induced with IPTG. and induced proteins were detected 
using trans-p 5 S]-label. As shown in Figure 8. induction of BL21(DE3)/pDC601 
resulted in expression of a large protein over 200 kDa in size along with several 
slightly smaller proteins, which presumably represent degradation products. In 
25 contrast, when BL2 1 (DE3)/pDC602 and BL2 1 (DE3)/pT7-7 were induced, there 
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« no expression offt.se proteins. Thts experiment indic^d that the genettc 
contained ,n the 8.3 kb XM fragment is transcribed ft. left <» nght as 
shown in Figure 7 and suggested ,ha« a single long open reading frame may be 



present. 



5 Nucleotide sequencing 

Nucleotide sequence was determined ustng a Sequenase kit and double-stranded 

both strands by primer walking. DNA sequence analysis was performed ustng the 
Genetics Computer Group (OCG) software package from the University of 

,0 Wisconsin (Devereux. J.. P. Haeberli. and O Smtthies. 1984. A comprehend 
set of sequence analysisprograms for the VAX. Nucleic Acids Res. 12:387-395.) 
Sequence similarity searches were carried ou, using the BLAST program of the 
National Center for Biotechnology Infonnation (AM* SS. W. Gish. W. MOcr. 
E.W. Myers, and D J. Lipman. 1990. Basis local alignment search tool. J. Mol Btol. 

15 215:403-410.). 

Sequenctng of the 8.3 kb AM fragment revealed a 7059 bp gene, which is 
designated for literature purposes as *«* U^opMus surface f,b,il, and » 
referred to herein as HA2. This gene encodes a 2353-amino acid polypept.de. 
rcferredto as Hsf or HA2. with a calculated molecular mass of 243.8 kDa. whtch 
, 0 is stmilar in size to the observed protein species detected after induction of 
BL'HDEWWOl- The/ W 2genehasaGCcon,entof42.8%.somewha,grea,er 
man the published estimate of 38-39% for the whole genome (Fleischmann e. al.. 
,9, 5 Whole-genomerandomsequencingandassemblyofH a e^«teW U e^ 

Ra Science 269: 496-512.. Kilian. M. 1976. A taxonomtc study of the genus 

,< NMh. - «*> ° f a n£W !peciM ' Ge "' MiCrobi °'' M 9 ' 6: '' A 

" puurtiveribosomalbindtngsitewtththesequenceAAGGTAbeginsUbasepatrs 
up.reamofuteprcsumeditutia.ioncodon.Asequence similar to a independent 
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transcription tenminatoris present beginning 20 nucleotides beyond the stop codon 
and contains interrupted inverted repeats with the potential for forming a hairpin 
structure containing a loop of two bases and a stem of 1 1 bases. Of note, a string 
of 29 thymines spans the region from 149 to 121 nucleotides upstream of HA2. 

5 Homology to HAlfHAl 

The nontypable H. influenzae nonpilus protein HA1 protein (called Hia in the 
literature) promotes attachment to cultured human epithelial cells as outlined above. 
Comparison of the predicted amino acid sequence of HA2 and the sequence of HA1 
revealed 8 1 % similarity and 72% identity overall. As depicted in Figure 5. the two 
1 0 sequences are highly conserved at their N-terminal and C-terminal ends, and both 
contain a Walker box nucleotide-bindingmotif. Interestingly. HA 1 is encoded by 
a 3.2 kb gene and is only 1 15-kDa. In this context, it is noteworthy that three 
separate stretches of HA2 (corresponding to amino acids 174 to 608. 847 to 1291. 
and 1476 to 1914, respectively) show significant homology to the region of HA 1 
1 5 defined by amino acids 22 1 to 658 (Figure 5). Table 2 summarizes the level of 
similarity and identity between these three stretches of HA2 and one another. The 
suggestion is that the larger size of HA2 may relate in part to the presence of a 
repeated domain which is present in single copy in HA 1 . 

Table 2. Percent similarity and percent identity between HA2 repeats. 
-° Percent Similarity/Percent Identity 

HA2 174-608' HA2 847- 1291 • HA2 1476-1914* 

HA2 174-608 ♦ 65/53 76/60 

HA2 847-1291 ♦ 70/56 

HA2 1476-1914 * 

25 'Numbers correspond to amino acid residue positions in the full-length HA2 (Hsf) 
protein. 
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Toevaluate W hether^yand/W2a re allelesofthesamelocus.aserie S ofSouthem 
blots were performed. Samples of chromosomal DN A from strains C54 and 1 1 were 
subjected to digestion with BgflL CM and either PsA or Xbal Resulting DNA 
fragments were separatedby agarose electrophoresis and transferred bidirectional * 
to nitrocellulose membranes. One membrane was probed with a 3.3 kb internal 
fragment of the HA2 gene (Figure 7). and the other membrane was probed with a 
1 .6 kb intragenic fragment of the HA! gene. As shown in Figure 9. both probes 
gnized exactly the same chromosomal fragments. 



recoc 



To obtain additional evidence that the HA2 and./H/ genes are homologs. the 
inactivation of HA2 by transformation of H influenzae strain C54bp with 
insertionallyinactivated/W/ was attempted. TheplasmidpHMW8-6(Barenkamp 
S J., and J.W. St. Geme. 111. Identification of a second family of high molecular weight 
adhesion proteins expressed by nontypable Haemophilus influenzae. Mol. Microbiol., 
in press ), which contains the HA I gene with an intragenic kanamycin cassette, was 
linearized with Ndel and introduced into competent C54. Southern hybridization 
c 0 nfirmedinsertionofthekanamycincasseueintoH.42(notshown). Furthermore, 
examination of the C54 mutant by negative staining transmission electron microscopy 
revealed the loss of surface ftbrils(not shown). Consistent with these findings, the 
mutantstraindemonstratedmin,malattachmentto Chang conjunctival cells (Table 



20 1). 



In additional experiments, the cellular binding specificities conferred by the HA2 
and HA1 proteins were compared. As shown in Figure 10. DH5o/pDC601 
(expressing HAT) demonstrated high level attachment to Chang cells. KB cells. HeLa 
cells and Intestine407 cells, moderate level attachmentto HEp-2 cells, and minimal 
attachment to HEC-IB cell, ME-180 cell, and CHO-K1 cells. DH5« harbonng 
pHMWS-5 (expressing HAD showed virtually the same pattern of attachment. 
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Giemsa staining and subsequent examination by light microscopy confirmed these 
viable count adherence assay results. 

Homology to other bacterial extracellular proteins 

A protein sequence similarity search was performed with the HA2 predicted amino 
acid sequence using the BLAST network service of the National Center for 
Biotechnology Information (Altschul. S.F.. W. Gish, W. Miller. E.W. Myers, and 
D.J. Lipman. 1990. Basis local alignment search tool. J. Mol. Biol. 215:403-410.). 
This search revealed low-level sequence similarity to a series of other bacterial 
adherence factors, including HMW1 and HMW2 (the proteins previously identified 
as being important adhesins in HA 1 -deficient nontypable//. influenzae strains: (St. 
Geme. J.W.III. S. Falkow. and S.J. Barenkamp. 1 993. High-molecular-weight proteins 
of nontypable Haemophilus influenzae mediate attachments human epithelial cells. 
Proc. Natl. Acad. Sci. U.S.A. 90:2875-2879.). AIDA-I (an adhesion protein expressed 
by somediarrheagenic E. coli strains: Benz, I., and M.A. Schmidt. 1992. AIDA-I. 
the adhesin involved in diffuse adherence of the diarrhoeagenic Escherichia coli strain 
2787 (0126.H27). is synthesized via a precursor molecule. Mol. Microbiol. 
6. 1539- 1546.). and Tsh (a hemagglutinin produced by an avian pathogenic £. coli 
strain: Provence. D. and R. Curtiss III. 1 994. Isolationand characterizationof a gene 
involved in hemagglutinationby an avian pathogenic Escherichia coli strain. Infect 
Immun. 62:1369-1380.). In addition. HA2 showed homology to SepA. a Shigella 
flexneri secreted protein that appears to play a role in tissue invasion 
(Benjelloun-Touimi. Z.. P.J. Sansonetti. and C. Parsot. 1995. SepA. the major 
extracellularproteinof Shigella flexneri: autonomous secretion and involvement in 
tissue invasion. Mol. Microbiol. 17:123-135.). Alignment of HA2 with HMWI. 
HMW2. AIDA-I. Tsh. and SepA revealed a highly conserved N-terminal domain 
(Figure 11). In AIDA-I. Tsh. and SepA. this NMerminal extremity precedes a typical 
procaryotic signal sequence (Benjelloun-Touimi. Z.. P.J. Sansonetti. and C. Parsot. 
1 995. SepA. the major extracellular protein of Shigella flexneri: autonomous secreiion 
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conserved domain precedes . 26 anr.no acid segmen, .ha, is charac.en*db ; 
alanine-glutamine-alanine. 

Prince of an HA2 b.moWg in o,her eneaps^Kd - psula.ed .rains 

Li, M« and Eagan <S, Gerne. ,W„ and 0. Currer. .9,, Ev-~* 
surface fibriis expressed by H^us Vpe b pror»o,e atiachmenuo 

llepidreUaieens Moi Microbio,,*^^., To defne ine io w,ch 
L. 2 ocnsissharedb y o,her,ypebs^ 

t soia.es by Sou,hem anaiysis were e— Among .hese — s, 
longing .o phyiogenic division > and four be.ong.ng ,o *"-» » 

Wusser J.M.. ,, Kro, ,, Moxon. and „ Sender. E— 
.eneucs of *e e„caps„a,ed sua.ns of *" » ^ 

Sci USA.85:7758-776 2 ,. Chromosoma.DNA was digest w,*^.^ .hen 
ob ed„,.h.he,.ragenic3, k bfra gro e„,of*e,,,gene. Assho^ 

tvpc b .aised .he ,ues.ion of .be prevaience of .h,s ,oc„s ,n o,her r»„ >pe b 
e„Taps U ,a,ed H of a series of ,ype . c. d. e. and f 

,so.a.es again demo„s.ra.ed a homo.og in ail cases (F.gare . 

to encode a protein of 298 ammo acids, derail. 
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the C54 HA2 gene, and the Rd derived amino acid sequence is 62% identical and 
75% similar to C54 HA2. Interestingly, the Rd open reading frame appears to be 
truncated due to a "premature" stop codon. 

Previous experiments revealed that 13 of 15 nontypable strains lacking an 
5 HMW l/HMW2-re!atedprotein had evidence of an HA 1 homolog (Barenkamp, SX 
and J.W. St. Geme, HI. Identification of a second family of high molecular weight 
adhesion proteins expressed by nontypable Haemophilus influenzae. Mol. MicrobioL 
in press ). Consistent with the demonstration that HA2 and HA1 are homologous. 
Southern analysis of these 15 strains, probing with the 3.3 kb fragment of hsfl 
1 0 demonstrated hybridization in 1 2 of the same 1 3 (not shown). 

Chromosomal location of the HA2 locus 

In earlier work, the HA1 locus in nontypable strain 1 1 was found to be flanked 
upstream by an open reading frame with significant homology to E. coli 
exoribonuclease II (Barenkamp. S.J.. and J.W. St. Geme. III. Identification of a second 

15 family of high molecular weight adhesion proteins expressed by nontypable 

Haemophilus influenzae. Mol. Microbiol., in press.). Similarly, the HA 2 locus in 
strain C54 likewise is flanked on the 5' side by an open reading frame with similarity 
to £. co// exonucleasell. This gene terminates 357 base pairs before the HA2 start 
codon and encodes a protein with a predicted amino acid sequence that is 61 % similar 

20 and 33% identical at its C -terminal end to exoribonuclease II. Of note, the Rd HA2 
homolog is also flanked upstream by the exoribonuclease II locus. 

EXAMPLE 3 
Cloning of HA3 

Recombinant phage containing the nontypable Haemophilus strain 32 HA3 gene were 
25 isolated and characterized using methods modified slightly from those described 
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previously (Barenkampand St. Geme. Molecular Microbiology 1996. in press). In 
brief, chromosomal DN A from strain 32 was prepared by a modification of the method 
of Marmur (Marmur. 1961). Sau3A partial restriction digests of the DNA were 
prepared fractionated on 0.7% agarose gels. Fractions containing DNA fragments 
in the 9- to 20- kbp range were pooled, and a library was prepared by ligation into 
A.EMBL3 arms. Ligation mixtures were packaged in vitro with Gigapack® 
(Stratagene, La Jolla. CA) and plate amplified in a P2 lysogen of E. coli LE392. 

Lambda plaque screening was performed using a mixture of three PCR products 
derived from strain 32 chromosomal DNA. These PCR products were amplified using 
primer pairs previously shown to amplify DNA segments at the 5' end of the strain 
1 1 HAl gene. The primers were as follows: 



15 



Primer designation 


strand 


sequence 


44P 


positive 


CCG TGC TTG CCOAAC ACG CTT 


64P 


positive 


GCT GCC ACC TTG CAC AAC AAC 


93G-2 


positive 


CTT TCA ATG CCA GAA AGT AGG 


18T-1 


negative 


CTT CAA CCG TTG CGG ACA AC A 



Each of the positive strand primers was used with the single negative strand primer 
to generate the three fragments used for probing the library. 



20 



25 



The PCR products generated from strain 1 1 and strain 32 chromosomal DNA were 
identical in size, suggesing that the nucleotide sequences of these chromosomal 
regions were similar in the two strains. Plaque screening was performed using 
standard methodology (Berger and Kimmel. 1987) at high stringency: final wash 
conditions were65C for 1 hour in buffer containing 2XSSC and 1% SDS. Positive 
plaques were identified by autoradiography, plaque purified and phage DNA was 
purified by standard methods. The same primer pa.rs used to generate the screening 
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probes were then used to localize the HA3 gene by amplifying various restriction 
fragments derived from the phage DNA. Once localized, the strain 32 HA3 gene 
and flanking DNA were sequenced using standard methods. 

In order to construct strain 32 isogenic Haemophilus influenzae mutants deficient 
in expression of the HA3 gene, bacteria were made competent using the MI V (Herriott 
et al. 1 970) and were transformed with linearized pHMW8-6, selecting for kanamycin 
resistance. Allelic exchange was confirmed by Southern analysis. The mutants that 
no longer expressed HA3 exhibiteda marked decrease in binding to Changepithelial 
cells, using the methods outlined above (data not shown). 

Expression in non-adherent strains of E. coli did not result in adherence, although 
it has not been confirmed that the protein was actually expressed. 
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SEQUENCE LISTING 

(1) GENERAL INFORMATION: 

(i) APPLICANT: Washington University 

TITLE OF INVENTION : HAEMOPHILUS ADHESION PROTEINS 
(iii) NUMBER OF SEQUENCES: 19 

(iv) CORRESPONDENCE ADDRES S : Albritton & Herbert 

(Al ADDRESSEE: Flehr, Hohbach. Test, 
S, SrEET: Four Enfcarcadero Center. Suxte 3400 

(C) CITY: San Francisco 

(D) STATE: California 

(E) COUNTRY: United States 

(F) ZIP: 94111-4187 

(v) COMPUTER READABLE FORM: 

(A) MEDIUM TYPE: Floppy disk 

(B) COMPUTER: IBM PC compatible 

(C> OPERATING SYSTEM: PC-DOS/MS-DOS 

Id) SOFTWARE: Patentln Release *l.0. version #1.30 

(vi) CURRENT APPLICATION DATA: 

(A) APPLICATION NUMBER: UNKNOWN 

(B) FILING DATE : 22-MAR-1996 

(C) CLASSIFICATION : 

(vii) PRIOR APPLICATION DATA: 

(A) APPLICATION NUMBER : US 08/409,995 

(B) FILING DATE: 24-MAR-199S 

(viii) ATTORNEY /AGENT INFORMATION: 

(A) NAME: Silva, Robin M. 

(B) REGISTRATION NUMBER: 38,304 

(C) REFERENCE/ DOCKET NUMBER: FP61053-1/RFT/RMS 

<ix> TELECOMMUNICATION INFORMATION: 

(A) TELEPHONE: (415) 781-1989 

(B) TELEFAX: (415) 398-3249 

(C) TELEX: 910 277299 



(2) INFORMATION FOR SEQ ID NO:l: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 32 94 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : unknown 

(D) TOPOLOGY: unknown 

,ii) MOLECULE TYPE: DNA (genomic) 
(xi) SEQUENCE DESCRIPTION: SEQ IE 



WO 96/30519 PCT/US96/0403I 

49 

ATGAACAAAA TTTTTAACGT TATTTGGAAT GTTGTGACTC AAACTTGGGT TGTCGTATCT 60 

GAACTCACTC GCACCCAGAC CAAATGCGCC TCCGCCACCG TGGCGGTTGC CGTATTGGCA 120 

ACCCTGTTGT CCGCAACGGT TGAGGCGAAC AACAATACTC CTGTTACGAA TAAGTTGAAG 180 

GCTTATGGCG ATGCGAATTT TAATTTCACT AATAATTCGA TAGCAGATGC AGAAAAACAA 24 0 

GTTCAAGAGG CTTATAAAGG TTTATTAAAT CTAAATGAAA AAAATGCGAG TGATAAACTG 300 

TTGGTGGAGG ACAATACTGC GGCGACCGTA GGCAATTTGC GTAAATTGGG CTGGGTATTG 36 0 

TCTAGCAAAA ACGGCACAAG GAACGAGAAA AGCCAACAAG TCAAACATGC GGATGAAGTG 420 

TTGTTTGAAG GCAAAGGCGG TGTGCAGGTT ACTTCCACCT CTGAAAACGG CAAACACACC 4 8 0* 

ATTACCTTTG CTTTAGCGAA AGACCTTGGT GTGAAAACTG CGACTGTGAG TGATACCTTA 540 

ACGATTGGCG GTGGTGCTGC TGCAGGTGCT ACAACAACAC CGAAAGTGAA TGTAACTAGT 600 

ACAACTGATG GCTTGAAGTT CGCTAAAGAT GCTGCGGGTG CTAATGGCGA TACTACGGTT 660 

CACTTGAATG GTATTGGTTC AACCTTGACA GACACGCTTG TGGGTTCTCC TGCTACTCAT 720 

ATTGACGGAG GAGATCAAAG TACGCATTAC ACTCGTGCAG CAAGTATCAA GGATGTCTTG 78 0 

AATGCGGGTT GGAATATCAA GGGTGTTAAA GCTGGCTCAA CAACTGGTCA ATCAGAAAAT B4 0 

GTCGATTTTG TTCATACTTA CGATACTGTT GAGTTCTTGA GTGCGGATAC AGAGACCACG 90 0 

ACTGTTACTG TAGATAGCAA AGAAAACGGT AAGAGAACCG AAGTTAAAAT CGGTGCGAAG 960 

AC7TCTGTTA TCAAAGAAAA AGACGGTAAG TTATTTACTG GAAAAGCTAA CAAAGAGACA 102 0 

AATAAAGTTG ATGGTGCTAA CGCGACTGAA GATGCAGACG AAGGCAAAGG CTTAGTGACT 1OB0 

GCGAAAGATG TGATTGACGC AGTGAATAAG ACTGGTTGGA GAATTAAAAC AACCGATGCT 114 0 

AATGGTCAAA ATGGCGACTT CGCAACTGTT GCATCAGGCA CAAATGTAAC CTTTGCTAGT 1200 

GGTAATGGTA CAACTGCGAC TGTAACTAAT GGCACCGATG GTATTACCGT TAAGTATGAT 126 0 

GCGAAAGTTG GCGACGGCTT AAAACTAGAT GGCGATAAAA TCGCTGCAGA TACGACCGCA 1320 

CTTACTGTGA ATGATGGTAA GAACG CTAAT AATCCGAAAG GTAAAGTGGC TGATGTTGCT 1380 

TCAACTGACG AGAAGAAATT GGTTACAGCA AAAGGTTTAG TAACAGCCTT AAACAGTCTA 144 0 

AGCTGGACTA CAACTGCTGC TGAGGCGGAC GGTGGTACGC TTGATGGAAA TGCAAGTGAG 1500 

CAAGAAGTTA AAGCGGGCGA TAAAG TAACC TTTAAAGCAG GCAAGAACTT AAAAGTGAAA 156 0 

CAAGAGGGTG CGAACTTTAC TTATTCACTG CAAGATGCTT TAACAGGCTT AACGAGCATT 16 20 

ACTTTAGGTA CAGGAAATAA TGGTGCGAAA ACTGAAATCA ACAAAGACGG CTTAACCATC 16 8 0 
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ACACCAGCAA ATGGTGCGGG TGCAAATAAT GCAAACACCA TCAGCGTAAC CAAAGACGGC 
ATTAGTGCGG GCGGTCAGTC GGTTAAAAAC GTTGTGAGCG GACTGAAGAA ATTTGGTGAT 
GCGAATTTCG ATCCGCTGAC TAGCTCCGCC GACAACTTAA CGAAACAAAA TGACGATGCC 
TATAAAGGCT TGACCAATTT GGATGAAAAA GGTACAGACA AGCAAACTCC AGTTGTTGCC 
GACAATACCG CCGCAACCGT GGGCGATTTG CGCGGCTTGG GCTGGGTCAT TTCTGCGGAC 
AAAACCACAG GCGGCTCAAC GGAATATCAC GATCAAGTTC GGAATGCGAA CGAAGTGAAA 
TTCAAAAGCG GCAACGGTAT CAATGTTTCC GGTAAAACGG TCAACGGTAG GCGTGAAATT 
ACTTTTGAAT TGGCTAAAGG TGAAGTGGTT AAATCGAATG AATTTACCGT CAAAGAAACC 
AATGGAAAGG AAACGAGCCT GGTTAAAGTT GGCGATAAAT ATTACAGCAA AGAGGATATT 
GACTTAACAA CAGGTCAGCC TAAATTAAAA GATGGCAATA CAGTTGCTGC GAAATATCAA 
GATAAAGGTG GCAAAGTCGT TTCTGTAACG GATAATACTG AAGCTACCAT AACCAACAAA 
GGTTCTGGCT ATGTAACAGG TAACCAAGTG GCAGATGCGA TTGCGAAATC AGGCTTTGAG 
CTTGGCTTGG CTGATGAAGC TGATGCGAAA CGGGCGTTTG ATGATAAGAC AAAAGCCTTA 
TCTGCTGGTA CAACGGAAAT TGTAAATGCC CACGATAAAG TCCGTTTTGC TAATGGTTTA 
AATACCAAAG TGAGCGCGGC AACGGTGGAA AGCACCGATG CAAACGGCGA TAAAGTGACC 
ACAACCTTTG TGAAAACCGA TGTGGAATTG CCTTTAACGC AAATCTACAA TACCGATGCA 
AACGGTAAGA AAATCACTAA AGTTGTCAAA GATGGGCAAA CTAAATGGTA TGAACTGAAT 
GCTGACGGTA CGGCTGATAT GACCAAAGAA GTTACCCTCG GTAACGTGGA TTCAGACGGC 
AAGAAAGTTG TGAAAGACAA CGATGGCAAG TGGTATCACG CCAAAGCTGA CGGTACTGCG 
G AT AAAAC C A AAGGCGAAGT GAGCAATGAT AAAGTTTCTA CCGATGAAAA ACACGTTGTC 
AGCCTTGATC CAAATGATCA ATCAAAAGGT AAAGGTGTCG TGATTGACAA TGTGGCTAAT 
GGCGATATTT CTGCCACTTC CACCGATGCG ATTAACGGAA GTCAGTTGTA TGCTGTGGCA 
AAAGGGGTAA CAAACCTTGC TGGACAAGTG AATAATCTTG AGGGCAAAGT GAATAAAGTG 
GGCAAACGTG CAGATGCAGG TACAGCAAGT GCATTAGCGG CTTCACAGTT ACCACAAGCC 
ACTATGCCAG GTAAATCAAT GGTTGCTATT GCGGGAAGTA GTTATCAAGG TCAAAATGGT 
TTAGCTATCG GGGTATCAAG AATTTCCGAT AATGGCAAAG TGATTATTCG CTTGTCAGGC 
ACAACCAATA GTCAAGGTAA AACAGGCGTT GCAGCAGGTG TTGGTTACCA GTGG 



1740 

1800 

1860 

1920 

1980 

2040 

2100 

2160 

2220 

2280 

2340 

2400 

2460 

2520 

2580 

2640 

2700 

2760 

2820 

2680 

2940 

3000 

3060 

3120 

31B0 

3240 

3294 
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(2) INFORMATION FOR SEQ ID NO: 2: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 1098 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS : untoiown 

(D) TOPOLOGY: unknown 



(ii) MOLECULE TYPE: protein 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 2: 

Met Asn Lys He Phe Asn Val He Trp Asn Val Val Thr Gin Thr Trp 
1 5 10 is 

Val Val Val Ser Glu Leu Thr Arg Thr His Thr Lys Cys Ala Ser Ala 
20 25 30 

Thr Val Ala Val Ala Val Leu Ala Thr Leu Leu Ser Ala Thr Val Glu 
35 40 45 

Ala Asn Asn Asn Thr Pro Val Thr Asn Lys Leu Lys Ala Tyr Gly Asp 
50 55 60 

Ala Asn Phe Asn Phe Thr Asn Asn Ser lie Ala Asp Ala Glu Lys Gin 
" 70 75 eo 

Val Gin Glu Ala Tyr Lys Gly Leu Leu Asn Leu Asn Glu Lys Asn Ala 
65 9 0 95 

Ser Asp Lys Leu Leu Val Glu Asp Asn Thr Ala Ala Thr Val Gly Asn 
100 105 no 

Leu Arg Lys Leu Gly Trp Val Leu Ser Ser Lys Asn Gly Thr Arg Asn 
115 120 125 

Glu Lys Ser Gin Gin Val Lys His Ala Asp Glu Val Leu Phe Glu Glv 
130 135 140 

Lys Gly Gly Val Gin Val Thr Ser Thr Ser Glu Asn Gly Lys His Thr 
145 "0 155 160 

He Thr Phe Ala Leu Ala Lys Asp Leu Gly Val Lys Thr Ala Thr Val 
1^5 170 175 

Ser Asp Thr Leu Thr He Gly Gly Gly Ala Ala Ala Gly Ala Thr Thr 
180 165 190 

Thr Pro Lys Val Asn Val Thr Ser Thr Thr Asp Gly Leu Lys Phe Ala 
195 200 205 

Lys Asp Ala Ala Gly Ala Asn Gly Asp Thr Thr Val His Leu Asn Gly 
210 215 220 

He Gly Ser Thr Leu Thr Asp Thr Leu Val Gly Ser Pro Ala Thr His 
225 230 235 240 
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He Asp Gly Gly Asp Gin Ser Thr His Tyr Thr Arg Ala Ala Ser He 
245 250 255 

Lys Asp Val Leu Asn Ala Gly Trp Asn He Lys Gly Val Lys Ala Gly 
260 265 270 

Ser Thr Thr Gly Gin Ser Glu Asn val Asp Phe Val His Thr Tyr Asp 
275 280 285 

Thr Val Glu Phe Leu Ser Ala Asp Thr Glu Thr Thr Thr Val Thr Val 
290 295 300 

Asp Ser Lys Glu Asn Gly Lys Arg Thr Glu Val Lys He Gly Ala Lys 
305 310 315 320 

Thr Ser Val He Lys Glu Lys Asp Gly Lys Leu Phe Thr Gly Lys Ala 
325 330 335 

Asn Lys Glu Thr Asn Lys Val Asp Gly Ala Asn Ala Thr Glu Asp Ala 
340 345 350 

Asp Glu Gly Lys Gly Leu Val Thr Ala Lys Asp val He Asp Ala Val 
355 360 365 

Asn Lys Thr Gly Trp Arg He Lys Thr Thr Asp Ala Asn Gly Gin Asn 
370 375 380 

Gly Asp Phe Ala Thr Val Ala Ser Gly Thr Asn Val Thr Phe Ala Ser 
385 390 395 400 

Gly Asn Gly Thr Thr Ala Thr Val Thr Asn Gly Thr Asp Gly He Thr 
405 410 415 

Val Lvs Tyr Asp Ala Lys Val Gly Asp Gly Leu Lys Leu Asp Gly Asp 
420 425 430 

Lys He Ala Ala Asp Thr Thr Ala Leu Thr Val Asn Asp Gly Lys Asn 
435 440 445 

Ala Asn Asn Pro Lys Gly Lys Val Ala Asp Val Ala Ser Thr Asp Glu 
450 455 460 

Lvs Lys Leu Val Thr Ala Lys Gly Leu Val Thr Ala Leu Asn Ser Leu 
465 470 475 480 

Ser Trp Thr Thr Thr Ala Ala Glu Ala Asp Gly Gly Thr Leu Asp Gly 
485 490 495 

Asn Ala Ser Glu Gin Glu Val Lys Ala Gly Asp Lys Val Thr Phe Lys 
500 505 510 

Ala Gly Lys Asn Leu Lys Val Lys Gin Glu Gly Ala Asn Phe Thr Tyr 
515 520 525 

Ser Leu Gin Asp Ala Leu Thr Gly Leu Thr Ser He Thr Leu Gly Thr 
530 535 540 
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Gly Asn Asn Gly Ala Lys Thr Glu lie Asn Lys Asp Gly Leu Thr lie 
545 550 555 560 

Thr Pro Ala Asn Gly Ala Gly Ala Asn Asn Ala Asn Thr lie Ser Val 
565 570 575 

Thr Lys Asp Gly lie Ser Ala Gly Gly Gin Ser Val Lys Asn Val Val 
580 585 590 

Ser Gly Leu Lys Lys Phe Gly Asp Ala Asn Phe Asp Pro Leu Thr Ser 
595 600 605 

Ser Ala Asp Asn Leu Thr Lys Gin Asn Asp Asp Ala Tyr Lys Gly Leu 
610 615 620 

Thr Asn Leu Asp Glu Lys Gly Thr Asp Lys Gin Thr Pro Val Val Ala 
625 630 635 640 

Asp Asn Thr Ala Ala Thr Val Gly Asp Leu Arg Gly Leu Gly Trp Val 
645 650 655 

He Ser Ala Asp Lys Thr Thr Gly Gly Ser Thr Glu Tyr His Asp Gin 
660 665 670 

Val Arg Asn Ala Asn Glu Val Lys Phe Lys Ser Gly Asn Gly He Asn 
675 680 685 

Val Ser Gly Lys Thr Val Asn Gly Arg Arg Glu He Thr Phe Glu Leu 
690 695 700 

Ala Lys Gly Glu Val Val Lys Ser Asn Glu Phe Thr Val Lys Glu Thr 
705 710 715 720 

Asn Gly Lys Glu Thr Ser Leu Val Lys Val Gly Asp Lys Tyr Tyr Ser 
725 730 735 

Lys Glu Asp He Asp Leu Thr Thr Gly Gin Pro Lys Leu Lys Asp Gly 
740 745 750 

Asn Thr Val Ala Ala Lys Tyr Gin Asp Lys Gly Gly Lys Val Val Ser 
755 760 765 

Val Thr Asp Asn Thr Glu Ala Thr He Thr Asn Lys Gly Ser Gly Tyr 
770 775 780 

Val Thr Gly Asn Gin Val Ala Asp Ala He Ala Lys Ser Gly Phe Glu 
785 790 795 800 

Leu Gly Leu Ala Asp Glu Ala Asp Ala Lys Arg Ala Phe Asp Asp Lys 
805 810 815 

Thr Lys Ala Leu Ser Ala Gly Thr Thr Glu He Val Asn Ala His Asp 
820 825 830 

Lys Val Arg Phe Ala Asn Gly Leu Asn Thr Lys Val Ser Ala Ala Thr 
835 840 845 
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val Glu Ser Thr Asp Ala Asn Gly Asp Lys Val Thr Thr Thr Phe Val 
850 BS5 B60 

Lys Thr Asp Val Glu Leu Pro Leu Thr Gin lie Tyr Asn Thr Asp Ala 
665 870 8 

Asn Gly Lys Lys lie Thr Lys Val Val Lys Asp Gly Gin Thr Lys Trp 
B 8 5 

Tyr Glu Leu Asn Ala Asp Gly Thr Ala Asp Met Thr Lys Glu Val Thr 
* 900 90S 910 

Leu Gly Asn Val Asp Ser Asp Gly Lys Lys Val Val Lys Asp Asn Asp 

915 32° 925 

Gly Lys Trp Tyr His Ala Lys Ala Asp Gly Thr Ala Asp Lys Thr Lys 

930 935 
Gly Glu Val Ser Asn Asp Lys Val Ser Thr Asp Glu Lys His Val Val 



945 



950 



Se 



r Leu Asp Pro Asn Asp Gin Ser Lys Gly Lys Gly Val Val lie Asp 



965 



970 



Asn Val Ala Asn Gly Asp lie Ser Ala Thr Ser Thr Asp Ala He Asn 
980 985 

Gly Ser Gin Leu Tyr Ala Val Ala Lys Gly Val Thr Asn Leu Ala Gly 
995 10°° 10 

Gin val Asn Asn Leu Glu Gly Lys Val Asn Lys Val Gly Lys Arg Ala 
1010 1015 1° 20 

Asn Ala Gly Thr Ala Ser Ala LeU Ala Ala Ser Gin Leu Pro Gin Al. 
10*25 1030 1°35 

Thr Met Pro Gly Lys Ser Met Val Ala lie Ala Gly Ser Ser Tyr Gin 
1045 1050 

Gly Gin Asn Gly Leu Ala lie Gly Val Ser Arg lie Ser Asp A«n Gly 

1060 1065 
Lys val lie He Arg Leu Ser Gly Thr Thr Asn Ser Gin Gly Lys Thr 



1075 



1080 



Gly Val Ala Ala Gly Val Gly Tyr Gin Trp 
1090 1095 

(2) INFORMATION FOR SEQ ID NO: 3: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 72 91 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : unknown 

(D ) TOPOLOGY : unknown 
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(ii) MOLECULE TYPE: DNA (genomic) 

(ix) FEATURE: 

(A) NAME /KEY : CDS 

(B) LOCATION: 163.. 7221 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO : 3 : 

TTTNTTTTTC TTATTTTTTT TTTTTTTTTT TTTTTTTTTT TTGAGGCTAA ACTTTTNGNA 60 

AAATATCACT TTTTTATTCT CCAAATATAG AATAGAATAC GCACGATTTC ACTAAGAAAA 120 

GTATATTTAT CATTAATTTT ATTAAATATA AGGTAAATAA AA ATG AAC AAA ATT 174 

Met Asn Lys lie 
1 

TTT AAC GTT ATT TGG AAT GTT ATG ACT CAA ACT TGG GTT GTC GTA TCT 222 
Phe Asn Val He Trp Asn Val Met Thr Gin Thr Trp Val Val Val Ser 
5 10 15 20 

GAA CTC ACT CGC ACC CAC ACC AAA CGC GCC TCC GCA ACC GTG GAG ACC 2 70 

Glu Leu Thr Arg Thr His Thr Lys Arg Ala Ser Ala Thr Val Glu Thr 
25 30 35 

GCC GTA TTG GCG ACA CTG TTG TTT GCA ACG GTT CAG GCG AAT GCT ACC 318 
Ala Val Leu Ala Thr Leu Leu Phe Ala Thr Val Gin Ala Asn Ala Thr 
40 45 50 

GAT GAA GAT GAA GAG TTA GAC CCC GTA GTA CGC ACT GCT CCC GTG TTG 366 
Asp Glu Asp Glu Glu Leu Asp Pro Val Val Arg Thr Ala Pro Val Leu 
55 60 65 

AGC TTC CAT TCC GAT AAA GAA GGC ACG GGA GAA AAA GAA GTT ACA GAA 414 
Ser Phe His Ser Asp Lys Glu Gly Thr Gly Glu Lys Glu Val Thr Glu 
70 75 80 

AAT TCA AAT TGG GGA ATA TAT TTC GAC AAT AAA GGA GTA CTA AAA GCC 46 2 

Asn Ser Asn Trp Gly He Tyr Phe Asp Asn Lys Gly Val Leu Lys Ala 
85 90 95 100 

GGA GCA ATC ACC CTC AAA GCC GGC GAC AAC CTG AAA ATC AAA CAA AAC 510 
Gly Ala He Thr Leu Lys Ala Gly Asp Asn Leu Lys He Lys Gin Asn 
105 110 115 

ACC GAT GAA AGC ACC AAT GCC AGT AGC TTC ACC TAC TCG CTG AAA AAA 55 B 

Thr Asp Glu Ser Thr Asn Ala Ser Ser Phe Thr Tyr Ser Leu Lys Lys 
120 125 . 130 

GAC CTC ACA GAT CTG ACC AGT GTT GCA ACT GAA AAA TTA TCG TTT GGC 606 
Asp Leu Thr Asp Leu Thr Ser Val Ala Thr Glu Lys Leu Ser Phe Gly 
135 140 145 



GCA AAC GGC GAT AAA GTT GAT ATT ACC AGT GAT GCA AAT GGC TTG AAA 
Ala Asn Gly Asp Lys Val Asp He Thr Ser Asp Ala Asn Gly Leu Lys 
150 155 160 



6 54 
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TTG GCG AAA ACA GGT AAC GGA AAT GTT CAT TTG AAT GOT TTG GAT TCA 
Leu Ala Lys Thr Gly Asn Gly Asn Val His Leu Asn Gly Leu Asp Ser 



165 



170 1" 180 



ACT TTG CCT GAT GCG GTA ACG AAT ACA GGT GTG TTA AGT TCA TCA AGT 
Thr Leu Pro Asp Ala Val Thr Asn Thr Gly val Leu Ser Ser Ser Ser 
185 19° 195 

TTT ACA CCT AAT GAT GTT GAA AAA ACA AGA GCT GCA ACT GTT AAA GAT 
Phe Thr Pro Asn Asp Val Glu Lys Thr Arg Ala Ala Thr Val Lys Asp 
200 205 210 

GTT TTA AAT GCA GGT TGG AAC ATT AAA GGT GCT AAA ACT GCT GGA GGT 
Val Leu Asn Ala Gly Trp Asn lie Lys Gly Ala Lys Thr Ala Gly Gly 
215 220 225 

AAT GTT GAG AGT GTT GAT TTA GTG TCC GCT TAT AAT AAT GTT GAA TTT 
Asn Val Glu Ser Val Asp Leu val Ser Ala Tyr Asn Asn Val Glu Phe 



230 



235 240 



AAT AAA GTT ACA AGT AAC ACG GCG ACT GAT AAT ACA GAT GAG GGT AAT 
Asn Lys Val Thr Ser Asn Thr Ala Thr Asp Asn Thr Asp Glu Gly Asn 

305 



295 



300 



GGC TTA GTC ACT GCA AAA GCT GTG ATT GAT GCT GTG AAC AAG GCT GGT 
Gly Leu Val Thr Ala Lys Ala Val lie Asp Ala Val Asn Lys Ala Gly 
310 315 320 

TGG AGA GTT AAA ACA ACT ACT GCT AAT GGT CAA AAT GGC GAC TTC GCA 
Trp Arg Val Lys Thr Thr Thr Ala Asn Gly Gin Asn Gly Asp Phe Ala 
325 330 335 

ACT GTT GCG TCA GGC ACA AAT GTA ACC TTT GAA AGT GGC GAT GGT ACA 
iS Vai Ala Ser Gly Thr Asn Val Thr Phe Glu Ser Gly Asp Gly Thr 
345 350 355 

ACA GCG TCA GTA ACT AAA GAT ACT AAC GGC AAT GGC ATC ACT GTT AAG 
Thr Ala ler Val Thr Lys Asp Thr Asn Gly Asn Gly lie Thr Val Lys 
360 365 370 

TAC GAC GCG AAA GTT GGC GAC GGC TTG AAA TTT GAT AGC GAT AAA AAA 
lyr Asp Ala Lys Val Gly Asp Gly Leu Lys Phe Asp Ser Asp L,s Lys 
375 380 . 385 



702 



750 



798 



646 



894 



942 



990 



ATT ACA GGC GAT AAA AAC ACG CTT GAT GTT GTA TTA ACA GCT AAA GAA 
lie Thr Gly Asp Lys Asn Thr Leu Asp Val Val Leu Thr Ala Lys Glu 
245 250 255 260 

AAC GGT AAA ACA ACC GAA GTG AAA TTC ACA CCG AAA ACC TCT GTT ATC 
Asn Gly Lys Thr Thr Glu Val Lys Phe Thr Pro Lys Thr Ser Val lie 
265 270 275 

AAA GAA AAA GAC GGT AAG TTA TTT ACT GGA AAA GAG AAT AAC GAC ACA 1038 
Lys Glu Lys Asp Gly Lys Leu Phe Thr Gly Lys Glu Asn Asn Asp Thr 
280 285 290 



1086 



1134 



1182 



1230 



1278 



1326 
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ill Jf* GCA ACT GTG ACA GGT AAG GTA GCT 

He Val Ala Asp Thr Thr ^ Leu Thr Val Thf ^ Qly ^ ^ «T 



395 



400 



GAA ATT GCT AAA GAA GAT GAC AAG AAA AAA CTT GTT AAT GCA GGC GAT 
Glu lie Ala Lys Glu Asp Asp Lys Lys Lys Leu Val £ Ala Gly Asp 

415 420 

lTu Sit If* ™ ** ° TA AGT TGG GCA AAA GCT GAG GCT 

Leu Val Thr Ala Leu Gly Asn Leu Ser Trp Lys Ala Lys Ala Glu All 

425 «0 435 

GAT ACT GAT GGT GCG CTT GAG GGG ATT TCA AAA GAC CAA GAA GTC AAA 
As P Thr Asp Gly Ala Leu Glu Gly He Ser Lys Asp Gin Glu vTl J£ 

440 44c 1 

445 450 



GCA GGC GAA ACG GTA ACC TTT AAA GCG GGC AAG AAC TTA AAA GTG AAA 
Ala Gly Glu Thr Val Thr Phe Lys Ala Gly Lys Asn lIu £ vll £ 

460 465 

G A n ^ ACT TAT TCA CTG C AA GAT GCT TTA ACG GGT 

Gin Asp Gly Ala Asn Phe Thr Tyr Ser Leu Gin Asp Ala Leu i£ Gly 



480 



TTA ACG AGC ATT ACT TTA GGT GGT ACA ACT AAT GGC GGA AAT GAT GCG 
Leu Thr ser He Thr Leu Gly Gly Thr Thr Asn Gly Gly ™ £p La 



495 



500 



£hr Val ^ J""' *** GAC GGT TTA ACC ATC ACG CCA GCA GGT AAT 
Lys Thr Val lie Asn Lys Asp Gly Leu Thr lie Thr Pro Ala Gly Asn 



510 



515 



Gly" Gly Jhr Jhr T T T ^ AGC G7A A " **» ATT 

Gly Gly Thr Thr Gly Thr Asn Thr lie Ser Val Thr Lys Asp Gly lie 

520 "5 530 

AAA GCA GGT AAT AAA GCT ATT ACT AAT GTT GCG AGT GGT TTA AGA GCT 
Lys Ala Gly Asn Lys Ala lie Thr Asn Val Ala Ser Gly Leu Arg Ala 
" 5 540 545 

TAT GAC GAT GCG AAT TTT GAT GTT TTA AAT AAC TCT GCA ACT GAT TTA 
Tyr Asp Asp Ala Asn Phe Asp Val Leu Asn Asn Ser Ala Jhr ts P ™ 

555 560 

AAT AGA CAC GTT GAA GAT GCT TAT AAA GGT TTA TTA AAT CTA AAT GAA 
Asn Arg Hxa Val Glu Asp Ala Tyr Lys Gly Leu Leu Asn Su £1 Glu 

570 "5 580 

AAA AAT GCA AAT AAA CAA CCG TTG GTG ACT GAC AGC ACG GCG GCG ACT 
Lys Asn Ala Asn Lys Gin Pro Leu- Val Thr Asp Ser Thr Ala Ala i£ 
565 590 

GTA GGC GAT TTA CGT AAA TTG GGT TGG GTA GTA TCA ACC AAA AAC GGT 
Val Gly Asp Leu Arg Lys Leu Gly Trp Val Val Ser Thr Lys Asn Gly 
600 60S 610 



1374 



1422 



1470 



1518 



1566 



1614 



1662 



1710 



1758 



1806 



1854 



1902 



1950 



L998 
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615 



620 



ACC GGA GCC GGT GCT GCT ACG GTT ACT TCC AAA TCT GAA AAC GGT AAA 
£ fly Til Gly Ala Ala Thr Val Thr Ser Lys Ser Glu Asn Gly Lys 



630 



635 



CAT ACG ATT ACC GTT AGT GTG GCT GAA ACT AAA GCG GAT TGC GGT CTT 
SI ?hr ne Thr Val Ser Val Ala Glu Thr Lys Ala Asp Cys Gly Leu 



645 



650 



655 



SM „ « J» £ «* £ ™ - ™ S - S « E 

Glu Lys Asp Gly Asp Thr lie vys lbu y *- ^ 



665 670 



TAT AAT GTT TTA ACT GTT GGT AAT AAT GGT ACT GCT GTC ACT AAA GGT 
Z £1 Zl ™ Thr val Gly Asn Asn Gly Thr Ala Val Thr Lys Gly 

685 



680 



GGC TTT GAA ACT GTT AAA ACT GGA GCG ACT GAT GCA GAT CGC GGT AAA 
Gly Tu val Lys Thr Gly Ala Thr Asp Ala Asp Arg Gly Lys 



695 



700 



GTA ACT GTA AAA GAT GCT ACT GCT AAT GAC GCT GAT AAG AAA GTC GCA 
Zl T-" 111 £ Asp Ala Thr Ala Asn Asp Ala Asp Lys Lys Val Ala 



710 



.715 



ACT GTA AAA GAT GTT GCA ACC GCA ATT AAT AGT GCG GCG ACT TTT GTG 
rZ Zl Lys Asp val Ala Thr Ala II. Asn Ser Ala Ala Thr Phe Val 

725 73° 735 

£5£ = = S = "£~"S SS = 

775 780 

rir tt" GCG AAA AAC CTT GAG GTG AAA ACT GCG AAA GTG AGT GAT ACT 
Z Hi £ £ ^ Glu val Lys Thr Ala Lys Val Ser Asp Thr 

790 795 
TTA ACG ATT GGC GGG AAT ACA CCT ACA GGT GGC ACT ACT GCG ACG CCA 
Zu ihr lie Gly Gly Asn Thr Pro Thr Gly Gly Thr Thr Ala Thr Pro 



805 810 



825 830 



2046 



2094 



2142 



2190 



2238 



2286 



2334 



2382 



2430 



2478 



2526 



2574 



2622 



2670 



WO 9d/30519 



PCT/US96/04031 



59 



ACA GCC GAT GCC TCG GGT TCT AAG AAT GTT TAT TTG AAA GGT ATT GCG 2718 
Thr Ala Asp Ala Ser Gly Ser Lys Asn Val Tyr Leu Lys Gly He Ala 
840 645 850 



ACA ACT TTA ACT GAG CCA AGC GCG GGA GCG AAG TCT TCA CAC GTT GAT 
Thr Thr Leu Thr Glu Pro Ser Ala Gly Ala Lys Ser Ser His Val Asp 
855 860 865 

TTA AAT GTG GAT GCG ACG AAA AAA TCC AAT GCA GCA AGT ATT GAA GAT 
Leu Asn Val Asp Ala Thr Lys Lys Ser Asn Ala Ala Ser lie Glu Asp 
870 875 880 

GTA TTG CGC GCA GGT TGG AAT ATT CAA GGT AAT GGT AAT AAT GTT GAT 
Val Leu Arg Ala Gly Trp Asn He Gin Gly Asn Gly Asn Asn Val Asp 
885 890 8 95 900 

TAT GTA GCG ACG TAT GAC ACA GTA AAC TTT ACC GAT GAC AGC ACA GGT 
Tyr val Ala Thr Tyr Asp Thr Val Asn Phe Thr Asp Asp Ser Thr Gly 
905 910 915 

ACA ACA ACG GTA ACC GTA ACC CAA AAA GCA GAT GGC AAA GGT GCT GAC 
Thr Thr Thr Val Thr Val Thr Gin Lys Ala Asp Gly Lys Gly Ala Asp 
920 925 930 

GTT AAA ATC GGT GCG AAA ACT TCT GTT ATC AAA GAC CAC AAC GGC AAA 
Val Lys He Gly Ala Lys Thr Ser Val He Lys Asp His Asn Gly Lys 
93 5 940 945 

CTG TTT ACA GGC AAA GAC CTG AAA GAT GCG AAT AAT GGT GCA ACC GTT 
Leu Phe Thr Gly Lys Asp Leu Lys Asp Ala Asn Asn Gly Ala Thr Val 
950 955 960 

AGT GAA GAT GAT GGC AAA GAC ACC GGC ACA GGC TTA GTT ACT GCA AAA 
Ser Glu Asp Asp Gly Lys Asp Thr Gly Thr Gly Leu Val Thr Ala Lys 
965 97 ° 975 980 

ACT GTG ATT GAT GCA GTA AAT AAA AGC GGT TGG AGG GTA ACC GGT GAG 
Thr Val He Asp Ala Val Asn Lys Ser Gly Trp Arg Val Thr Gly Glu 
985 990 995 

GGC GCG ACT GCC GAA ACC GGT GCA ACC GCC GTG AAT GCG GGT AAC GCT 
Gly Ala Thr Ala Glu Thr Gly Ala Thr Ala Val Asn Ala Gly Asn Ala 
1000 1005 1010 



ACC ACA GCG ACC GTA AGC AAA GAT AAT GGC AAC ATC AAT GTC AAA TAC 
Thr Thr Ala Thr Val Ser Lys Asp Asn Gly Asn lie Asn Val Lys Tyr 
1030 1035 1040 



2766 

2814 

2862 

2910 

2958 

3006 

3054 

3102 

3150 

3198 



GAA ACC GTT ACA TCA GGC ACG AGC GTG AAC TTC AAA AAC GGC AAT GCG 3246 
Glu Thr Val Thr Ser Gly Thr Ser Val Asn Phe Lys Asn Gly Asn Ala 
iOlS 1020 1025 



3294 



GAT GTA AAT GTT GGT GAC GGC TTG AAG ATT GGC GAT GAC AAA AAA ATC 3 342 

Asp Val Asn Val Gly Asp Gly Leu Lys He Gly Asp Asp Lys Lys He 
1045 1050 1055 1060 
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SESEEESESSSSEEEE 

1065 107U 

Pro Ala uly Aia nsn 10g0 
1080 108b 

EESSEEEEEEESESEE 

1095 1100 

sseeseeesseeeees 

1110 1115 

« »rv* ttt AAA GCA GGC AAG AAC TTA AAA GTG 

E S E - E E - - - «• « V - - So 

1125 1130 

_ ^ ACX XAT TCA CTG CAA GAC ACT TTA ACA 

E e e e e E s s ^ s « r «. ~ «. * 

1145 1150 

GGC TTA ACG AGC ATT ACT TTA GGT GGT ACA GCT AAT GGC AGA AAT GAT 
™ Thr ser lie Thr Leu Gly Gly Thr Ala Asn Gly Arg *.» Asp 
1160 1165 
*rr PTC ATC AAC AAA GAC GGC TTA ACC ATC ACG CTG GCA AAT 

£ E E E E E E «P «r - «• »• ™ "° 

1180 i*o 3 

S SEE S E EE E E E EE S E S 
E EE 5 5 E S S E E EE E E E E 

1210 i*x3 

EE S E E S E S E EE E S S EE 

1225 12JU 

CAA GAT AAA GAG TTC CAC GCC GCC GTT AAA AAC OCA AAT GAA GTT GAG 

Gin Asp Lys Glu Phe Bis Ala Ala Val Lys Asn A ^ 
1240 1Z4S 

, „ sl llf . rGT GCA acc GTG TCT GCA AAA ACT GAT AAC AAC 
TTC GTG GGT AAA AAC GGT GCA A ^ ^ ^ Asn 

Phe Val Gly Lys Asn Gly Ala Tnr^ 

S E EE E E E S E S E EE E E S 



3390 



3438 



3466 



3534 



3582 



3630 



3678 



3726 



3774 



3822 



3870 



3918 



3966 



4014 



1270 12 ^ S 
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GGT CTT GAA AAA GAT ACT GAC GGC AAG ATT AAA CTC AAA GTA GAT AAT 4062 
Gly Leu Glu Lys Asp Thr Asp Gly Lys lie Lys Leu Lys Val Asp Asn 
1285 1290 1295 1300 

ACA GAT GGG AAT AAT CTA TTA ACC GTT GAT GCA ACA AAA GGT GCA TCC 4110 
Thr Asp Gly Asn Asn Leu Leu Thr Val Asp Ala Thr Lys Gly Ala Ser 
1305 1310 1315 

GTT GCC AAG GGC GAG TTT AAT GCC GTA ACA ACA GAT GCA ACT ACA GCC 4158 
Val Ala Lys Gly Glu Phe Asn Ala Val Thr Thr Asp Ala Thr Thr Ala 
1320 1325 1330 

CAA GGC ACA AAT GCC AAT GAG CGC GGT AAA GTG GTT GTC AAG GGT TCA 4206 
Gin Gly Thr Asn Ala Asn Glu Arg Gly Lys Val Val Val Lys Gly Ser 
1335 1340 1345 

AAT GGT GCA ACT GCT ACC GAA ACT GAC AAG AAA AAA GTG GCA ACT GTT 42 54^ 

Asn Gly Ala Thr Ala Thr Glu Thr Asp Lys Lys Lys Val Ala Thr Val 
1350 1355 1360 

GGC GAC GTT GCT AAA GCG ATT AAC GAC GCA GCA ACT TTC GTG AAA GTG 4 302 

Gly Asp Val Ala Lys Ala He Asn Asp Ala Ala Thr Phe Val Lys Val 
1365 1370 1375 1380 

GAA AAT GAC GAC AGT GCT ACG ATT GAT GAT AGC CCA ACA GAT GAT GGC 4 350 

Glu Asn Asp Asp Ser Ala Thr He Asp Asp Ser Pro Thr Asp Asp Gly 
1385 1390 1395 

GCA AAT GAT GCT CTC AAA GCA GGC GAC ACC TTG ACC TTA AAA GCG GGT 4398 
Ala Asn Asp Ala Leu Lys Ala Gly Asp Thr Leu Thr Leu Lys Ala Gly 
1400 1405 1410 

AAA AAC TTA AAA GTT AAA CGT GAT GGT AAA AAT ATT ACT TTT GCC CTT 444 6 

Lys Asn Leu Lys Val Lys Arg Asp Gly Lys Asn He Thr Phe Ala Leu 
1415 1420 1425 

GCG AAC GAC CTT AGT GTA AAA AGC GCA ACC GTT AGC GAT AAA TTA TCG 4 4 94 

Ala Asn Asp Leu Ser Val Lys Ser Ala Thr Val Ser Asp Lys Leu Ser 
1430 1435 1440 

CTT GGT ACA AAC GGC AAT AAA GTC AAT ATC ACA AGC GAC ACC AAA GGC 4 54 2 

Leu Gly Thr Asn Gly Asn Lys Val Asn He Thr Ser Asp Thr Lys Gly 
1445 1450 1455 1460 

TTG AAC TTC GCT AAA GAT AGT AAG ACA GGC GAT GAT GCT AAT ATT CAC 4 5 90 

Leu Asn Phe Ala Lys Asp Ser Lys Thr Gly Asp Asp Ala Asn He His 
1465 1470 1475 

TTA AAT GGC ATT GCT TCA ACT TTA ACT GAT ACA TTG TTA AAT AGT GGT 4638 
Leu Asn Gly He Ala Ser Thr Leu Thr Asp Thr Leu Leu Asn Ser Gly 
1480 " 1485 1490 

GCG ACA ACC AAT TTA GGT GGT AAT GGT ATT ACT GAT AAC GAG AAA AAA 4.6 86 

Ala Thr Thr Asn Leu Gly Gly Asn Gly He Thr Asp Asn Glu Lys Lys 
1495 1500. 1505 



WO 96/30519 



PCTOJS96/04031 



62 

CGC GCG GCG AGC GTT AAA GAT GTC TTG AAT GCG GGT TGG AAT GTT CGT 4 7 34 

Arg Ala Ala Ser Val Lys Asp Val Leu Asn Ala Gly Trp Asn Val Arg 
1510 1515 1520 

GGT GTT AAA CCG GCA TCT GCA AAT AAT CAA GTG GAG AAT ATC GAC TTT 4782 
Gly Val Lys Pro Ala Ser Ala Asn Asn Gin Val Glu Asn He Asp Phe 
1525 1530 1535 1540 

GTA GCA ACC TAG GAC ACA GTG GAC TTT GTT AGT GGA GAT AAA GAC ACC 4830 
Val Ala Thr Tyr Asp Thr Val Asp Phe Val Ser Gly Asp Lys Asp Thr 
1545 1550 1555 

ACG AGT GTA ACT GTT GAA AGT AAA GAT AAT GGC AAG AGA ACC GAA GTT 4878 
Thr Ser Val Thr Val Glu Ser Lys Asp Asn Gly Lys Arg Thr Glu Val 
1560 1565 1570 

AAA ATC GGT GCG AAG ACT TCT GTT ATC AAA GAC CAC AAC GGC AAA CTG * 4 92 6 
Lys He Gly Ala Lys Thr Ser Val He Lys Asp His Asn Gly Lys Leu 
1575 1580 1585 

TTT ACA GGC AAA GAG CTG AAG GAT GCT AAC AAT AAT GGC GTA ACT GTT 4 974 

Phe Thr Gly Lys Glu Leu Lys Asp Ala Asn Asn Asn Gly Val Thr Val 
1590 159S 1600 

ACC GAA ACC GAC GGC AAA GAC GAG GGT AAT GGT TTA GTG ACT GCA AAA 5022 
Thr Glu Thr Asp Gly Lys Asp Glu Gly Asn Gly Leu Val Thr Ala Lys 
1605 1610 1615 1620 

GCT GTG ATT GAT GCC GTG AAT AAG GCT GGT TGG AGA GTT AAA ACA ACA 507 0 

Ala Val He Asp Ala Val Asn Lys Ala Gly Trp Arg Val Lys Thr Thr 
1625 1630 1635 

GGT GCT AAT GGT CAG AAT GAT GAC TTC GCA ACT GTT GCG TCA GGC ACA 5118 
Gly Ala Asn Gly Gin Asn Asp Asp Phe Ala Thr Val Ala Ser Gly Thr 
1640 1645 1650 

AAT GTA ACC TTT GCT GAT GGT AAT GGC ACA ACT GCC GAA GTA ACT AAA 5166 
Asn Val Thr Phe Ala Asp Gly Asn Gly Thr Thr Ala Glu Val Thr Lys 
1655 1660 1665 

GCA AAC GAC GGT AGT ATT ACT GTT AAA TAC AAT GTT AAA GTG GCT GAT 5214 
Ala Asn Asp Gly Ser lie Thr Val Lys Tyr Asn Val Lys Val Ala Asp 
1670 1675 1680 

GGC TTA AAA CTA GAC GGC GAT AAA ATC GTT GCA GAC ACG ACC GTA CTT 5262 
Gly Leu Lys Leu Asp Gly Asp Lys lie Val Ala Asp Thr Thr Val Leu 
1685 1690 1695 1700 

ACT GTG GCA GAT GGT AAA GTT ACA GCT CCG AAT AAT GGC GAT GGT AAG 5310 
Thr Val Ala Asp Gly Lys Val Thr Ala Pro Asn Asn Gly Asp Gly Lys 
170 5 171° 1715 



AAA TTT GTT GAT GCA AGT GGT TTA GCG GAT GCG TTA AAT AAA TTA AGC 
Lys Phe Val Asp Ala Ser Gly Leu Ala Asp Ala Leu Asn Lys Leu Ser 
1720 1^25 1730 



5358 
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TGG ACG GCA ACT GCT GGT AAA GAA GGC ACT GGT GAA GTT GAT CCT GCA 54 06 

Trp Thr Ala Thr Ala Gly Lys Glu Gly Thr Gly Glu Val Asp Pro Ala 
1735 1740 1745 

AAT TCA GCA GGG CAA GAA GTC AAA GCG GGC GAC AAA GTA ACC TTT AAA 54 54 

Asn Ser Ala Gly Gin Glu Val Lys Ala Gly Asp Lys Val Thr Phe Lys 
1750 1755 1760 

GCC GGC GAC AAC CTG AAA ATC AAA CAA AGC GGC AAA GAC TTT ACC TAC 55 02 

Ala Gly Asp Asn Leu Lys lie Lys Gin Ser Gly Lys Asp Phe Thr Tyr 
1765 1770 1775 1780 

TCG CTG AAA AAA GAG CTG AAA GAC CTG ACC AGC GTA GAG TTC AAA GAC 5550 
Ser Leu Lys Lys Glu Leu Lys Asp Leu Thr Ser Val Glu Phe Lys Asp 
1785 1790 1795 

GCA AAC GGC GGT ACA GGC AGT GAA AGC ACC AAG ATT ACC AAA GAC GGC 5598 
Ala Asn Gly Gly Thr Gly Ser Glu Ser Thr Lys He Thr Lys Asp Gly 
1800 1805 1810 

TTG ACC ATT ACG CCG GCA AAC GGT GCG GGT GCG GCA GGT GCA AAC ACT 5646 
Leu Thr He Thr Pro Ala Asn Gly Ala Gly Ala Ala Gly Ala Asn Thr 
1815 1820 1825 

GCA AAC ACC ATT AGC GTA ACC AAA GAT GGC ATT AGC GCG GGT AAT AAA 56 94 

Ala Asn Thr He Ser Val Thr Lys Asp Gly He Ser Ala Gly Asn Lys 
1830 1835 1840 

GCA GTT ACA AAC GTT GTG AGC GGA CTG AAG AAA TTT GGT GAT GGT CAT 5742 
Ala Val Thr Asn Val Val Ser Gly Leu Lys Lys Phe Gly Asp Gly His 
184 5 1850 1855 I860 

ACG TTG GCA AAT GGC ACT GTT GCT GAT TTT GAA AAG CAT TAT GAC AAT 5790 
Thr Leu Ala Asn Gly Thr Val Ala Asp Phe Glu Lys His Tyr Asp Asn 
1865 1870 1875 

GCC TAT AAA GAC TTG ACC AAT TTG GAT GAA AAA GGC GCG GAT AAT AAT 583 B 

Ala Tyr Lys Asp Leu Thr Asn Leu Asp Glu Lys Gly Ala Asp Asn Asn 
1B80 18B5 1890 

CCG ACT GTT GCC GAC AAT ACC GCT GCA ACC GTG GGC GAT TTG CGC GGC 5886 
Pro Thr Val Ala Asp Asn Thr Ala Ala Thr Val Gly Asp Leu Arg Gly 
1895 1900 1905 

TTG GGC TGG GTC ATT TCT GCG GAC AAA ACC ACA GGC GAA CCC AAT CAG 5 93 4 

Leu Gly Trp Val He Ser Ala Asp Lys Thr Thr Gly Glu Pro Asn Gin 
1910 1915 1920 

GAA TAC AAC GCG CAA GTG CGT AAC GCC AAT GAA GTG AAA TTC AAG AGC 5 98 2 

Glu Tyr Asn Ala Gin Val Arg Asn Ala Asn Glu Val Lys Phe Lys Ser 
1925 1930 1935 1940 

GGC AAC GGT ATC AAT GTT TCC GGT AAA ACA TTG AAC GGT ACG CGC GTG 6 03 0 

Gly Asn Gly He Asn Val Ser Gly Lys Thr Leu Asn Gly Thr Arg Val 
1945 1950 1955 
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SEEEEESSEEESEEEE 

e e s e e s s s e £ ee ee " - 

1975 1980 
GAT ATG TAT TAC AGC AAA GAG GAT ATT GAC CCG GCA ACC AGT AAA CCG 
Asp Met Tyr Tyr Ser Lys Glu Asp He Asp Pro Alajhr y 

1990 1995 
ATG ACA GGT AAA ACT GAA AAA TAT AAG GTT GAA AAC GGC AAA GTC GTT 
Met Thr Gly Lys Thr Glu Lys Tyr Lys Val Glu Asn Gly y 
2005 2010 201 

r » e e e E £ e s s s £ e E e e 

Ser Ala Asn Gly ser uys 2035 
2025 2030 

GGC TAT GTA ACA GGT AAC CAA GTG GCT GAT GCG ATT GCG AAA TCA GGC 
3ly lyl vlt tS Gly Asn Gin Val Ala Asp Ala lie Ala Jy. Ser Gly 
2040 2045 

SSEEEEEEEEEESEES 

2055 2060 2065 

GAA AGC GCA AAA GAC AAG CAA TTG TCT AAA GAT AAA GCG GAA ACT GTA 
ctu Ser Ala Lys Asp Lys Gin Leu Ser Lys Asp Lys Ala Glu Thr 
2070 2075 2080 

* B T GCC CAC GAT AAA GTC CGT TTT GCT AAT GGT TTA AAT ACC AAA GTG 
Z Ala H-s Asp Lys Val Ar 9 Phe Ala Asn Gly Leu Asn Thr Lys Val ^ 
2065 2090 2095 

E S E £ EE E S S EE E E £ E E 

E £ E E E £ E E E ™ E S £ EE E 

2120 2125 

_ n«/- nTr PTT AAA AAA GCT GAC GGA AAA 

E £ asp S E E E E E E E S; «p «» - 

2135 2140 

TGG TAT GAA CTG AAT GCT GAT OCT JOB GCG JGT AAC AAA GAA GTG ACA 
Trp Tyr Glu Leu Asn Ala Asp Gly Thr Ala Ser Asn^Lys 

2150 2155 
CTT OCT AAC CTC 0»T OCA AAC COT AAC AAA OTT CTC AAA OTA ACC CAA 

Leu Gly Asn Val Asp Ala Asn Gly Lys Lys Valval Lys ^ 



6078 



6126 



6174 



6222 



6270 



6318 



6366 



6414 



6462 



6510 



6558 



6606 



6654 



6702 
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AAT GGT GCG GAT AAG TGG TAT TAC ACC AAT GCT GAC GGT GCT GCG GAT 6750 
Asn Gly Ala Asp Lys Trp Tyr Tyr Thr Asn Ala Asp Gly Ala Ala Asp 
2185 2190 2 195 

AAA ACC AAA GGC GAA GTG AGC AAT GAT AAA GTT TCT ACC GAT GAA AAA 6798 
Lys Thr Lys Gly Glu Val Ser Asn Asp Lys Val Ser Thr Asp Glu Lys 
2200 2205 2210 

CAC GTT GTC CGC CTT GAT CCG AAC AAT CAA TCG AAC GGC AAA GGC GTG 6846 
His Val Val Arg Leu Asp Pro Asn Asn Gin Ser Asn Gly Lys Gly Val 
2215 2220 2225 

GTC ATT GAC AAT GTG GCT AAT GGC GAA ATT TCT GCC ACT TCC ACC GAT 6894 
Val lie Asp Asn Val Ala Asn Gly Glu lie Ser Ala Thr Ser Thr Asd 
2230 2235 2 240 

GCG ATT AAC GGA AGT CAG TTG TAT GCC GTG GCA AAA GGG GTA ACA AAC 6942 

Ala lie Asn Gly Ser Gin Leu Tyr Ala Val Ala Lys Gly Val Thr Asn 

2245 2250 2255 2260 . . 

CTT GCT GGA CAA GTG AAT AAT CTT GAG GGC AAA GTG AAT AAA GTG GGC 6990 
Leu Ala Gly Gin Val Asn Asn Leu Glu Gly Lys Val Asn Lys Val Gly 
2265 2270 2275 

AAA CGT GCA GAT GCA GGT ACA GCA AGT GCA TTA GCG GCT TCA CAG TTA 
Lys Arg Ala Asp Ala Gly Thr Ala Ser Ala Leu Ala Ala Ser Gin Leu 
2280 2285 2290 

CCA CAA GCC ACT ATG CCA GGT AAA TCA ATG GTT GCT ATT GCG GGA AGT 
Pro Gin Ala Thr Met Pro Gly Lys Ser Met Val Ala lie Ala Gly Ser 
2295 2300 2305 

AGT TAT CAA GGT CAA AAT GGT TTA GCT ATC GGG GTA TCA AGA ATT TCC 
Ser Tyr Gin Gly Gin Asn Gly Leu Ala lie Gly Val Ser Arg lie Ser 
2310 2315 2320 

GAT AAT GGC AAA GTG ATT. ATT CGC TTG TCA GGC ACA ACC AAT AGT CAA 
Asp Asn Gly Lys Val He He Arg Leu Ser Gly Thr Thr Asn Ser Gin 
2325 2330 2335 2 340 

GGT AAA ACA GGC GTT GCA GCA GGT GTT GGT TAC CAG TGG TAAAGTTTGG 7231 
Gly Lys Thr Gly Val Ala Ala Gly Val Gly Tyr Gin Trp 
2345 2350 



7038 



7086 



7134 



7182 



ATTATCTCTC TTAAAAAGCG GCATTTGCCG CTTTTTTTAT GGGTGGCTAT TATGTATCGT 



7291 



(2) INFORMATION FOR SEQ ID NO: 4: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 2353 amino acids 

(B) TYPE: amino acid 
(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE : protein 
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„i, SEOOEMCE DESCRirriOH, SEQ 10 NO: 4: 
« M. U. -J » ~ ,U *P » v., ~ ~ - - «P 

« V,! *1 «. ol » - « « «' «" ^ ^ "30 S " 

20 

», v., «. - - v., U. « * - ~ - - - ~ ~ 

35 



Ala „ «. - «P «. « - «« - - °Z " 3 ™ 

I « ^ ~ - «. - « - «: 5iy Thr sly 010 To 

70 

65 

^ rw He Tvr Phe Asp Asn Lys Gly 
Glu val Thr Glu Asn Ser Asn Trp Gly I Ty ^ 
85 

* i a Ile Thr Leu Lys Ala Gly Asp Asn Leu Lys 
Val Leu Lys Ala Gly Ala He Thr i*e y iiQ 

100 

Ile Ly , «. » -P «- - - - - ~ - * ~ 

115 

« ^ »p - - - - ™ s - is Ala ™ G1 " 
wu Z «» «, - » «» « « s ll * TKr set Mp «• 

145 150 



L « u W5 ~ «• « « » val His 5S 

t" Leu « «P «• V«! ™ «. C1Y V! « 



Asn Gly 

165 



Gly Leu Asp Ser Thr beu ^ ~r 19C 
ser MI se, s« «. - « » « - «» - S'. " 9 "* AU 

w v,i " -p v., - - - °» - - & Wi Gly " a " vs 

Thr T. 01, * - 1 •« - - £ ^ ~ M " ^ " 

I v. - ne 01, « » - - « - £ - 

245 

n ivs Thr Thr Glu Val Lys Phe Thr Pro Lys 
Thr Ala Lys Glu Asn Gly Lys Thr ™ 21Q 
260 

_ -. T , /c; Leu phe Thr Gly Lys Glu 
Thr set val He ,ys Glu I*. ASP Gly Lys 
275 
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Asn Asn Asp Thr Asn Lys Val Thr Ser Asn Thr Ala Thr Asp Asn Thr 
290 295 300 

Asp Glu Gly Asn Gly Leu Val Thr Ala Lys Ala Val lie Asp Ala Val 
305 310 315 320 

Asn Lys Ala Gly Trp Arg Val Lys Thr Thr Thr Ala Asn Gly Gin Asn 
325 330 335 

Gly Asp Phe Ala Thr Val Ala Ser Gly Thr Asn Val Thr Phe Glu Ser 
340 345 350 

Gly Asp Gly Thr Thr Ala Ser Val Thr Lys Asp Thr Asn Gly Asn Gly 
355 360 365 

lie Thr Val Lys Tyr Asp Ala Lys Val Gly Asp Gly Leu Lys Phe Asp 
370 375 380 

Ser Asp Lys Lys He Val Ala Asp Thr Thr Ala Leu Thr Val Thr Gly 
385 390 395 400 

Gly Lys Val Ala Glu He Ala Lys Glu Asp Asp Lys Lys Lys Leu Val 
405 410 415 

Asn Ala Gly Asp Leu Val Thr Ala Leu Gly Asn Leu Ser Trp Lys Ala 
420 425 430 

Lys Ala Glu Ala Asp Thr Asp Gly Ala Leu Glu Gly He Ser Lys Asp 
435 440 445 

Gin Glu Val Lys Ala Gly Glu Thr Val Thr Phe Lys Ala Gly Lys Asn 
450 455 460 

Leu Lys Val Lys Gin Asp Gly Ala Asn Phe Thr Tyr Ser Leu Gin Asp 
465 470 475 480 

Ala Leu Thr Gly Leu Thr Ser He Thr Leu Gly Gly Thr Thr Asn Gly 
485 490 495 

Gly Asn Asp Ala Lys Thr Val He Asn Lys Asp Gly Leu Thr He Thr 
500 505 510 

Pro Ala Gly Asn Gly Gly Thr Thr Gly Thr Asn Thr He Ser Val Thr 
515 520 525 

Lys Asp Gly He Lys Ala Gly Asn Lys Ala He Thr Asn Val Ala Ser 
530 535 540 

Gly Leu Arg Ala Tyr Asp Asp Ala Asn Phe Asp Val Leu Asn Asn Ser 
545 550 555 560 

Ala Thr Asp Leu Asn Arg His Val Glu Asp Ala Tyr Lys Gly Leu Leu 
565 570 575 



Asn Leu Asn Glu Lys Asn Ala Asn Lys Gin Pro Leu Val Thr Asp Ser 
580 585 590 
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Thr Ala Al. Thr Val Gly Asp Leu Arg Lys Leu Cly Trp Val Val Ser 

595 6°° 605 

Thr Lys Asn Gly Thr Lys Glu Glu Ser Asn Gin Val Lys Gin Ala Asp 
610 " 5 " 0 

Glu Val Leu Phe Thr Gly Ala Gly Ala Ala Thr Val Thr Ser Lys Ser 
S25 630 635 

Glu Asn Gly Lys His Thr He Thr Val Ser Val Ala Glu Thr Lys Ala 



645 



650 



Asp Cys 



Gly L eu Glu Lys Asp Gly Asp Thr He Lys Leu Lys Val Asp 

— — — O / 0 



660 



665 



Asn Gin Asn Thr Asp Asn Val Leu Thr Val Gly Asn Asn Gly Thr Ala 

675 680 685 

Val Thr Lys Gly Gly Phe Glu Thr Val Lys Thr Gly Ala Thr Asp Ala 

690 695 
Asp Arg Gly Lys Val Thr Val Lys Asp Ala Thr Ala Asn Asp Ala Asp 
705 7 " 715 

Lys Lys val Ala Thr Val Lys Asp Val Ala Thr Ala lie Asn Ser Ala 

725 730 

Ala Thr Phe Val Lys Thr Glu Asn Leu Thr Thr Ser lie Asp Glu Asp 
740 ™ 5 7S ° 

Asn Pro Thr Asp Asn Gly Lys Asp Asp Ala Leu Lys Ala Gly Asp Thr 

755 760 765 

Leu Thr Phe Lys Ala Gly Lys Asn Leu Lys Val Lys Arg Asp Gly Lys 

770 77 * 780 

Asn lie Thr Phe Asp Leu Ala Lys Asn Leu Glu Val Lys Thr Ala Lys 
785 7 9° 795 

val Ser Asp Thr Leu Thr lie Gly Gly Asn Thr Pro Thr Gly Gly Thr 
805 810 

Thr Ala Thr Pro Lys Val Asn He Thr Ser Thr Ala Asp Gly Leu Asn 

820 825 
Phe Ala Lys Glu Thr Ala Asp Ala Ser Gly Ser Lys Asn Val Tyr Leu 



835 



840 



Lys Gly He Ala Thr Thr Leu Thr Glu Pro Ser Al. Gly Ala Lys Ser 



850 



855 



Ser Hi. Val Asp Leu Asn Val Asp Ala Thr Lys Lys Ser Asn Ala Ala 
865 670 85 

Ser He Glu Asp Val Leu Arg Ala Gly Trp Asn He Gin Gly Asn Gly 



885 
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Asn Asn val Asp Tyr Val Ala Thr Tyr Asp Thr Val Asn Phe Thr Asp 
900 905 910 

Asp Ser Thr Gly Thr Thr Thr Val Thr Val Thr Gin Lys Ala Asp Gly 
915 920 925 

Lys Gly Ala Asp Val Lys lie Gly Ala Lys Thr Ser Val lie Lys Asp 
930 W5 940 

His Asn Gly Lys Leu Phe Thr Gly Lys Asp Leu Lys Asp Ala Asn Asn 
945 950 9 55 960 

Gly Ala Thr Val Ser Glu Asp Asp Gly Lys Asp Thr Gly Thr Gly Leu 
9 65 970 g 75 

Val Thr Ala Lys Thr Val lie Asp Ala Val Asn Lys Ser Gly Trp Arg 



985 9 9o 
Val Thr Gly Glu Gly Ala Thr Ala Glu Thr Gly Ala Thr Ala Val 



995 



1000 



Asn 



1005 



e 

1040 



Ala Gly Asn Ala Glu Thr Val Thr Ser Gly Thr Ser Val Asn Phe Lys 
1010 1015 102 0 

Asn Gly Asn Ala Thr Thr Ala Thr Val Ser Lys Asp Asn Gly Asn II 
1025 1030 1035 10 

Asn Val Lys Tyr Asp Val Asn Val Gly Asp Gly Leu Lys He Gly Asp 
1045 1050 10 ss P 

Asp Lys Lys He Val Ala Asp Thr Thr Thr Leu Thr Val Thr Gly Gly 
1060 1065 1070 

Lys Val Ser Val Pro Ala Gly Ala Asn Ser Val Asn Asn Asn Lvs Lys 
1075 1080 108 5 

Leu Val Asn Ala Glu Gly Leu Ala Thr Ala Leu Asn Asn Leu Ser Trp 
1090 1095 1100 

Thr Ala Lys Ala Asp Lys Tyr Ala Asp Gly Glu Ser Glu Gly Glu Thr 
1105 1110 HIS 1120 

Asp Gin Glu Val Lys Ala Gly Asp Lys Val Thr Phe Lys Ala Gly Lys 
ll 25 1130 i 135 

Asn Leu Lys Val Lys Gin Ser Glu Lys Asp Phe Thr Tyr Ser Leu Gin 
1140 H45 11S0 

Asp Thr Leu Thr Gly Leu Thr Ser lie Thr Leu Gly Gly Thr Ala Asn 
1155 1160 U65 

Gly Arg Asn Asp Thr Gly Thr Val lie Asn Lys Asp Gly Leu Th- lie 
1170 H75 hbo 

Thr Leu Ala Asn Gly Ala Ala Ala Gly Thr Asp Ala Ser Asn Gly Asn 

H85 liqf) liar 



1195 



1200 



PCTA3S96/04031 

WO 96/30519 

70 

riv tip Ser Ala Gly Asn Lys Glu He 
Thr lie Ser Val Thr Lys Asp Gly He Ser Ala y ^ 

1205 1210 

Thr Asn v.! ,y= M. AU Thr Tyr ,y. Asp Thr Oln » Thr 

1220 122b 
Ma Asp aiu Thr Oln Asp Lys Glu Phe His Ala Ala Val Lys Asn Ala 

X235 1240 
Asn Glu Val Glu Phe Val Gly Lys Asn Gly Ala thrVU Ser Ala Lys 

12S0 1255 

Thr Asp Asn Ash <Hy W «* ™ l ^ " St> ^ "* ^0 

1 7*70 1^/3 

Lys val Gly Asp Gly Leu Glu Lys Asp Thr Asp Gly Lys lie Lys^Leu . 
1285 1290 

Lys val Asp Asn Thr Asp Gly Asn Asn Leu Leu Thr val As^Ala Thr 

1300 1305 
Lys G iy Ala Ser Val Ala Lys Gly Glu Phe Asn Ala valjhr Thr Asp 
1315 1320 



Ala Thr Thr Ala Gin Gly Thr Asn Ala Asn Glu Ar^Gly Lys Val val 

1330 1335 
Val Lys Gly Ser Asn Gly Ala Thr Ala Thr Glu Thr Asp Lys Lys L^ 

1345 

val Ala Thr Val Gly Asp Val Ala Lys Ala lie Asn Asp Ala Alajhr 

1365 1370 

Phe val ,y, v, a =lu » Asp Asp S« A la Thr ,1. Asp "° 

1380 1385 

Thr Asp Asp .1, »U Asn Asp AU I.SU -ys «1. «Y «P ™ -» ™ 

1395 1400 
Leu L ys Ala Gly Lys Asn Leu Lys Val Lys Ar 9 As^Gly Lys Asn lie 

1410 1415 
Thr Phe Ala Leu Ala Asn Asp Leu Ser Val Lys Ser Ala Thr Val S« 

1430 A** .5 3 

1425 

Asp Lys Leu Ser Leu Gly Thr Asn Gly Asn^Lys Val Asn He Thr^er 



1445 



nw^ B i a ivs Asd Ser Lys Thr Gly Asp Asp 
Asp Thr Lys Gly Leu Asn Phe Ala Lys Asp y 

1460 1465 



Tie Ala Ser Thr Leu Thr Asp Thr Leu 
Ma Asn lie His Leu Asn Gly le Ala ^ 

1475 14b 

» ser G1V Ala Thr Thr Asn Leu Gly Gly Asn Gly He Thr Asp 
Leu Asn Ser Gly Aia uu 



1490 1495 
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Asn Glu Lys Lys Arg Ala Ala Ser Val Lys Asp Val I*eu Asn Ala Gly 
1505 1510 1515 1520 

Trp Asn Val Arg Gly Val Lys Pro Ala Ser Ala Asn Asn Gin Val Glu 
1525 1530 1535 

Asn lie Asp Phe Val Ala Thr Tyr Asp Thr Val Asp Phe Val Ser Gly 
1540 1545 1550 

Asp Lys Asp Thr Thr Ser Val Thr Val Glu Ser Lys Asp Asn Gly Lys 
1555 1560 1565 

Arg Thr Glu Val Lys He Gly Ala Lys Thr Ser Val He Lys Asp His 
1570 1575 15B0 

Asn Gly Lys Leu Phe Thr Gly Lys Glu Leu Lys Asp Ala Asn Asn Asn 
1585 1590 1595 1600 

Gly val Thr Val Thr Glu Thr Asp Gly Lys Asp Glu Gly Asn Gly Leu 
1605 1610 1615 

Val Thr Ala Lys Ala Val He Asp Ala Val Asn Lys Ala Gly Trp Arg 
1620 1625 1630 

Val Lys Thr Thr Gly Ala Asn Gly Gin Asn Asp Asp Phe Ala Thr Val 
1635 1640 1645 

Ala Ser Gly Thr Asn Val Thr Phe Ala Asp Gly Asn Gly Thr Thr Ala 
1650 1655 1660 

Glu Val Thr Lys Ala Asn Asp Gly Ser He Thr Val Lys Tyr Asn Val 
1665 1670 1675 1680 

Lys Val Ala Asp Gly Leu Lys Leu Asp Gly Asp Lys He Val Ala Asp 
1685 1690 1695 

Thr Thr Val Leu Thr Val Ala Asp Gly Lys Val Thr Ala Pro Asn Asn 
1? 00 1705 1710 

Gly Asp Gly Lys Lys Phe Val Asp Ala Ser Gly Leu Ala Asp Ala Leu 
17 15 1720 1725 

Asn Lys Leu Ser Trp Thr Ala Thr Ala Gly Lys Glu Gly Thr Gly Glu 
1730 1735 1740 

Val Asp Pro Ala Asn Ser Ala Gly Gin Glu Val Lys Ala Gly Asp Lys 
1745 1750 1755 1760 

Val Thr Phe Lys Ala Gly Asp Asn Leu Lys He Lys Gin Ser Gly Lys 
1765 1770 1775 

Asp Phe Thr Tyr Ser Leu Lys Lys Glu Leu Lys Asp Leu Thr Ser Val 
1780 1785 1790 

Glu Phe Lys Asp Ala Asn Gly Gly Thr Gly Ser Glu Ser Thr Lys He 
1795 1800 1805 
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Thr Lys Asp Gly Leu Thr He Thr Pro Ala Asn Gly Ala Gly Ala Ala 
1810 1815 1820 

Gly Ala Asn Thr Ala Asn Thr He Ser Val Thr Lys Asp Gly He Ser 
1825 1B30 1835 1840 

Ala Gly Asn Lys Ala Val Thr Asn Val Val Ser Gly Leu Lys Lys Phe 
1645 1B50 1855 

Gly Asp Gly His Thr Leu Ala Asn Gly Thr Val Ala Asp Phe Glu Lys 
I860 1B65 1870 

His' Tyr Asp Asn Ala Tyr Lys Asp Leu Thr Asn Leu Asp Glu Lys Gly 
18 75 18B0 1885 

Ala Asp Asn Asn Pro Thr Val Ala Asp Asn Thr Ala Ala Thr Val Gly 
1890 1895 1900 

Asp Leu Arg Gly Leu Gly Trp Val He Ser Ala Asp Lys Thr Thr Gly 
1905 1910 1915 1920 

Glu Pro Asn Gin Glu Tyr Asn Ala Gin Val Arg Asn Ala Asn Glu Val 
1925 1930 1935 

Lys Phe Lys Ser Gly Asn Gly He Asn Val Ser Gly Lys Thr Leu Asn 
1940 1945 1950 

Gly Thr Arg Val He Thr Phe Glu Leu Ala Lys Gly Glu Val Val Lys 
1955 I960 1965 

Ser Asn Glu Phe Thr Val Lys Asn Ala Asp Gly Ser Glu Thr Asn Leu 
1970 1975 1980 

Val Lys Val Gly Asp Met Tyr Tyr Ser Lys Glu Asp lie Asp Pro Ala 
1985 1990 1995 2000 

Thr Ser Lvs Pro Met Thr Gly Lys Thr Glu Lys Tyr Lys Val Glu Asn 
2005 2010 2015 

Glv Lys Val Val Ser Ala Asn Gly Ser Lys Thr Glu Val Thr Leu Thr 
2020 2025 2030 

Asn Lys Gly Ser Gly Tyr Val Thr Gly Asn Gin Val Ala Asp Ala He 
2035 2040 2045 

Ala Lys Ser Gly Phe Glu Leu Gly Leu Ala Asp Ala Ala Glu Ala Glu 
2050 2055 2060 

Lys Ala Phe Ala Glu Ser Ala Lys Asp Lys Gin Leu Ser Lys Asp Lys 
2065 2070 2075 2080 

Ala Glu Thr Val Asn Ala His Asp Lys Val Arg Phe Ala Asn Gly Leu 
20B5 2090 2095 

Asn Thr Lys Val Ser Ala Ala Thr Val Glu Ser Thr Asp Ala Asn Gly 
2100 2105 2110 
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*. «. nr* - Tte Phe ^ J, ^ ^ ^ ^ 
- £ „. Tyr Thr ^ ^ My ^""^ ^ ^ 

^ «„ 4. „ ^ „„ Leu As „ Ala m> s ^ ^ 

«. v., Thr s Asn Uil Asp ™ Lys Lye val 17 

2170 2175 

Lys Val Thr Gl u Asn Glv m a , „ 

218 0 Ma ASP **r Thr Asn Ala Asp 

2185 2a90 
Gly Ala Ala Asp Lys Thr Lys Gly Glu Val c » 

2195 X 2^ 0 ° lu Val Ser Asp Lys Val Ser 

" 00 2205 
Thr Asp Glu Lys His Val Val A« , , 

2210 Sis ^ ASn ASn G1 « Asn 

2220 

G ^y Lys Gly Val Val T1« a 

1 Val ll^Asp Asn Val Ala Asn Gl y Glu Ile Sfir ^ 

2235 22 «° 
Thr Ser Thr Asp Ala lie Asn Gly Ser G l„ T , 

2245 7 ^" Leu Tyr Ala Val Ala Lys 

2255 

Gly val Thr Asn Leu Ala Gly Gln Val ic „ 

2260 y ™ **» As " Leu Glu Gly L ys Val 

2265 2270 

Asn Lys Val Gly L ys Ara » 

22?5 ys Arg Ala A ^ a Gly Thr Ala Ser Ala Leu Ala 

280 2285 
- - - P ro ^ Met oly ^ ^ 

ser te , aly . Mn Asn 01y ^ :le 

2315 2320 

" s ' 3 s " =1 * «*• v " - - - 

Thr Asn s - *° - •» - «« - ox, v* „ y 



2350 
Trp 



(2) INFORMATION FOR SEQ ID 



NO: 5 : 



<i> SEQUENCE CHARACTERISTICS • 

<A) LENGTH: 658 amino acids 
(B) TYPE: amino acid 
<C) STRANDEDNESS unknown 
CD) TOPOLOGY: unknown 
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(ii) MOLECULE TYPE: protein 

(Xi) SEQUENCE DESCRIPTION: SEQ ID NO: 5: 

Met Asn Lys lie Phe Asn Val He Trp Asn Val Val Thr Gin Thr Trp 
1 5 10 

Val val val Ser Glu Leu Thr Arg Thr His Thr Lys Cys Ala Ser Ala 



25 30 
20 " 



Thr val Ala Val Ala Val Leu Ala Thr Leu Leu Ser Ala Thr Val Glu 

35 , . 40 

Al. Asn Asn Asn Thr Pro" Val Thr Asn Lys Leu Lys Ala Tyr Gly Asp 

50 " 
A la Asn Phe Asn Phe Thr Asn Asn Ser lie Ala Asp Ala Glu Lys Gin 
65 70 5 

Val Gin Glu Ala Tyr Lys Gly Leu Leu Asn Leu Asn Glu Lys Asn Ala 
85 90 

Ser Asp Lys Leu Leu Val G!u Asp Asn Thr Ala Ala Thr Val Gly Asn 

100 105 
L eu Arg Lys Leu Gly Trp Val Leu Ser Ser Lys Asn Gly Thr Arg Asn 

115 1ZU 
Glu Lys Ser Gin Gin Val Lys His Ala Asp Glu Val Leu Phe Glu Gly 

130 135 14 

L ys Gly Gly Val Gin Val Thr Ser Thr Ser Glu Asn Gly Lys Hi. Thr 
145 150 1" 

lie Thr Phe Ala Leu Ala Lys Asp Leu Gly Val Lys Thr Ala Thr Val 

165 170 

s .r A,p Thr L.» Thr He ,ly Gly ol, Al. U. 1. «r JJ; ™ ™ 

1B0 185 

Thr Pro Lys Val Asn Val Thr Ser Thr Thr Asp Gly Leu Lys Phe Ala 

195 200 
Lys Asp Ala Ala Gly Ala Asn Gly Asp Thr Thr Val His Leu Asn Gly 

210 215 
Ue Gly Ser Thr Leu Thr Asp Thr Leu Val Gly Ser Pro Ala Thr Hi. 
225 230 

mu. y- Hit; Tvr Thr Arg Ala Ala Ser He 
lie Asp Gly Gly Asp Gin Ser Thr His Tyr Thr ^ 

245 ZbU 
, ys * s p v.l «. *» «• "V Trp T n. ly «» \?l ^ 



260 265 
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Ser Thr Thr Gly Gin Ser Glu Asn Val Asp Phe Val His Thr Tyr Asp 
275 280 285 

Thr Val Glu Phe Leu Ser Ala Asp Thr Glu Thr Thr Thr Val Thr Val 
290 295 300 

Asp Ser Lys Glu Asn Gly Lys Arg Thr Glu Val Lys lie Gly Ala Lys 
305 310 315 320 

Thr Ser Val lie Lys Glu Lys Asp Gly Lys Leu Phe Thr Gly Lys Ala 
325 330 335 

Asn Lys Glu Thr Asn Lys Val Asp Gly Ala Asn Ala Thr Glu Asp Ala 
340 345 350 

Asp Glu Gly Lys Gly Leu Val Thr Ala Lys Asp Val He Asp Ala Val 
355 360 365 

Asn Lys Thr Gly Trp Arg He Lys Thr Thr Asp Ala Asn Gly Gin Asn 
370 375 . 390 

Gly Asp Phe Ala Thr Val Ala Ser Gly Thr Asn Val Thr Phe Ala Ser 
3B5 390 395 400 

Gly Asn Gly Thr Thr Ala Thr Val Thr Asn Gly Thr Asp Gly He Thr 
405 410 415 

Val Lys Tyr Asp Ala Lys Val Gly Asp Gly Leu Lys Leu Asp Gly Asp 
420 425 430 

Lys He Ala Ala Asp Thr Thr Ala Leu Thr Val Asn Asp Gly Lys Asn 
435 440 445 

Ala Asn Asn Pro Lys Gly Lys Val Ala Asp Val Ala Ser Thr Asp Glu 
450 455 460 

Lys Lys Leu Val Thr Ala Lys Gly Leu Val Thr Ala Leu Asn Ser Leu 
465 470 475 480 

Ser Trp Thr Thr Thr Ala Ala Glu Ala Asp Gly Gly Thr Leu Asp Gly 
4B5 490 495 

Asn Ala Ser Glu Gin Glu Val Lys Ala Gly Asp Lys Val Thr Phe Lys 
500 505 510 

Ala Gly Lys Asn Leu Lys Val Lys Gin Glu Gly Ala Asn Phe Thr Tyr 
515 520 525 

Ser Leu Gin Asp Ala Leu Thr Gly Leu Thr Ser He Thr Leu Gly Thr 
530 535 540 

Gly Asn Asn Gly Ala Lys Thr Glu He Asn Lys Asp Gly Leu Thr lie 
545 550 555 560 



Thr Pro Ala Asn Gly Ala Gly Ala Asn Asn Ala Asn Thr He Ser Val 
565 570 575 
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Thr Lys Asp Gly He Ser Ala Gly Gly Gin Ser Val Lys Asn Val val 



580 



Ser 



Gly Leu Lys Lys Phe Gly Asp Ala Asn Phe Asp Pro Leu Thr Ser 

605 



595 



600 



Ser Ala Asp Asn Leu Thr Lys Gin Asn Asp Asp Ala Tyr Lys Gly Leu 

615 «0 



610 

Thr Asn 
625 

Asp Asn 



Leu Asp Glu Lys Gly Thr Asp Lys Gin Thr Pro Val Val Ala 

635 640 



630 



Thr Ala Ala Thr Val Gly Asp Leu Arg Gly Leu Gly Trp Val 



645 



650 



655 



He Ser 



(2) INFORMATION FOR SEQ ID NO: 6: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 607 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS : unknown 

(D) TOPOLOGY: unknown 



(ii) MOLECULE TYPE: protein 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 6: 

Met Asn Lys lie Phe Asn Val lie Trp Asn Val Met Thr Gin Thr Trp 
1 5 10 15 

Val Val Val Ser Glu Leu Thr Arg Thr His Thr Lys Arg Leu Arg Asn 
20 " 30 

Arg Gly Asp Pro Val Leu Ala Thr Leu Leu Phe Ala Thr Val Gin Ala 
35 40 45 

Asn Ala Thr Asp Glu Asp Glu Glu Leu Asp Pro Val Val Arg Thr Ala 
50 55 60 

Pro Val Leu Ser Phe Has Ser Asp Lys Glu Gly Thr Gly Glu Lys Glu 
65 70 ?5 80 

Val Thr Glu Asn Ser Asn Trp Gly He Tyr Phe Asp Asn Lys Gly Val 
65 9° 95 

Leu Lys Ala Gly Ala He Thr Leu Lys Ala Gly Asp Asn Leu Lys Xaa 

ioo i° 5 110 

L vs Gin Xaa Thr Asp Glu Xaa Thr Asn Ala Ser Ser Phe Thr Tyr Ser 

l2d 



115 



120 



L eu Lys Lys Asp Leu Thr Asp Leu Thr Ser Val Ala Thr Glu Lys Leu 



130 



135 
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Ser Phe Gly Ala Asn Gly Asp Lys Val Asp He Thr Ser Asp Ala Asn 
145 150 155 160 

Gly Leu Lys Leu Ala Lys Thr Gly Asn Gly Asn Val His Leu Asn Gly 
165 170 175 

Leu Asp Ser Thr Leu Pro Asp Ala Val Thr Asn Thr Gly Val Leu Ser 
180 185 190 

Ser Ser Ser Phe Thr Pro Asn Asp Val Glu Lys Thr Arg Ala Ala Thr 
195 200 205 

Val Lys Asp Val Leu Asn Ala Gly Trp Asn He Lys Gly Ala Lys Thr 
210 215 220 

Ala Gly Gly Asn Val Glu Ser Val Asp Leu Val Ser Ala Tyr Asn Asn 
225 230 235 240 

Val Glu Phe He Thr Gly Asp Lys Asn Thr Leu Asp Val Val Leu Thr 
245 250 255 

Ala Lys Glu Asn Xaa Lys Thr Thr Glu Val Lys Phe Thr Pro Lys Thr 
260 265 270 

Ser Val He Lys Glu Lys Asp Gly Lys Leu Phe Thr Gly Lys Glu Asn 
275 280 285 

Asn Asp Thr Asn Lys Val Thr Ser Asn Thr Ala Thr Asp Asn Thr Asp 
290 295 300 

Glu Gly Asn Gly Leu Val Thr Ala Lys Ala Val He Asp Ala Val Asn 
305 310 315 320 

Lys Ala Gly Trp Arg Val Lys Thr Thr Thr Ala Asn Gly Gin Asn Gly 
325 330 335 

Asp Phe Ala Thr Val Ala Ser Gly Thr Asn Val Thr Phe Glu Ser Gly 
340 345 350 

Asp Gly Thr Thr Ala Ser Val Thr Lys Asp Thr Asn Gly Asn Gly He 
355 360 365 

Thr Val Lys Tyr Asp Ala Lys Val Gly Asp Gly Leu Lys Phe Asp Ser 
370 375 380 

Asp Lys Lys He Val Ala Asp Thr Thr Ala Leu Thr Val Thr Gly Gly 
385 390 395 400 

Lys Val Ala Glu He Ala Lys Glu Asp Asp Lys Lys Lys Leu Val Asn 
405 410 415 



Ala Gly Asp Leu Val Thr Ala Leu Gly Asn Leu Ser Trp Lys Ala Lys 
420 425 430 



Ala Glu Ala Asp Thr Asp Gly Ala Leu Glu Gly He Ser Lys Asp Gin 
435 440 445 
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Glu Val Lys Ala Gly Glu Thr Val Thr Phe Lys Ala Gly Lys Asn Leu 
450 455 460 

Lys Val Lys Gin Asp Gly Ala Asn Phe Thr Tyr Ser Leu Gin Asp Ala 
465 470 475 480 

Leu Thr Gly Leu Thr Ser lie Thr Leu Gly Gly Thr Thr Asn Gly Gly 
485 490 495 

Asn Asp Ala Lys Thr Val He Asn Lys Asp Gly Leu Thr He Thr Pro 
500 505 510 

Ala Gly Asn Gly Gly Thr Thr Gly Thr Asn Thr He Ser Val Thr Lys 
515 520 525 

Asp Gly He Lys Ala Gly Asn Lys Ala He Thr Asn Val Ala Ser Gly 
530 535 540 

Leu Arg Ala Tyr Asp Asp Ala Asn Phe Asp Val Leu Asn Asn Ser Ala 
545 550 555 560 

Thr Asp Leu Asn Arg His Val Glu Asp Ala Tyr Lys Gly Leu Leu Asn 
565 570 575 

Leu Asn Glu Lys Asn Ala Asn Lys Gin Pro Leu Val Thr Asp Ser Thr 
580 585 590 

Ala Ala Thr Val Gly Asp Leu Arg Lys Leu Gly Trp Val Val Ser 
595 600 605 

(2) INFORMATION FOR SEQ ID NO: 7: 

(i) SEQUENCE CHARACTERISTICS : 

(A) LENGTH: 24 ammo acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS : unknown 

( D ) TOPOLOGY : unknown 

(ii) MOLECULE TYPE: protein 

<xi) SEQUENCE DESCRIPTION: SEQ ID NO: 7: 

Met Asn Lys He Phe Asn Val He Trp Asn Val Met Thr Gin Thr Trp 
15 10 15 

Val Val Val Ser Glu Leu Thr Arg 
20 

(2) INFORMATION FOR SEQ ID NO : 8 : 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 24 ammo acids 

(B) TYPE: amino acid 

iC) STRANDEDNESS: unknown 
(D) TOPOLOGY: unknown 
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(ii) MOLECULE TYPE: protein 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:B: 

Met Asn Lys lie Phe Asn Val'Ile Trp Ash Val Val Thr Gin Thr Trp 
15 10 15 

Val Val Val Ser Glu Leu Thr Arg 
20 

(2) INFORMATION FOR SEQ ID NO: 9: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 24 amino acids 
<B) TYPE : amino acid 

(C) STRANDEDNESS: unknown 

(D) TOPOLOGY: unknown 

(ii) MOLECULE TYPE: protein - 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO : 9 : 

Met Asn Lys lie Tyr Arg Leu Lys Phe Ser Lys Arg Leu Asn Ala Leu 
1 5 10 15 

Val Ala Val Ser Glu Leu Ala Arg 
20 

(2) INFORMATION FOR SEQ ID NO: 10: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 24 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: unknown 

( D ) TOPOLOGY : unknown 

(ii> MOLECULE TYPE: protein 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 10: 

Met Asn Lys lie Tyr Arg Leu Lys Phe Ser Lys Arg Leu Asn Ala Leu 
15 10 15 

Val Ala Val Ser Glu Leu Ala Arg 
20 

(2) INFORMATION FOR SEQ ID NO:ll: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 24 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: unknown 
{ D ) TOPOLOGY : unknown 



(ii) MOLECULE TYPE : protein 
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(Xi) SEQUENCE DESCRIPTION: SEQ ID NO: 11: 

Met Asn Lys Ala Tyr Ser He He Trp Ser His Ser Arg Gin Ala Trp 



He Val Ala Ser Glu Leu Ala Arg 
20 

(2) INFORMATION FOR SEQ ID NO: 12: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 24 amino acids 

(B) TYPE: amino acid 

(C) STRAND EDNESS : unknown 

(D) TOPOLOGY: unknown 



(ii) MOLECULE TYPE: protein 

(Xi) SEQUENCE DESCRIPTION: SEQ ID NO: 12: 

Met Asn Arg He Tyr Ser Leu Arg Tyr Ser Ala Val Ala Arg Gly Phe 



He Ala Val Ser Glu Phe Ala Arg 
20 

(2) INFORMATION FOR SEQ ID NO: 13: 



(i) SEQUENCE CHARACTERISTICS : 

(A) LENGTH: 24 amino acids 

(B ) TYPE: amino acid 

( C i STRANDEDNESS : unknown 
( D ) TO PO LOGY : unkn own 



(ii) MOLECULE TYPE: protein 

(Xi) SEQUENCE DESCRIPTION: SEQ ID NO: 13: 

Met Asn Lys He Tyr Tyr Leu Lys Tyr Cys His He Thr Lys 
1 5 10 



He Ala Val Ser Glu Leu Ala Arg 
20 



(2) INFORMATION FOR SEQ ID NO: 14: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 2037 base pairs 

(B) TYPE: nucleic acid 

( C ) S TRANDEDNES S : unknown 

(D) TOPOLOGY: unknown 

(ii) MOLECULE TYPE: DNA (genomic) 
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(xi) SEQUENCE DESCRIPTION: SEQ ID NO:14: 



ATGAACAAAA 


. TTTTTAACGT 


' TATTTGGAAT 


GTTGTGACTC 


AAACTTGGGT 


TGTCGTATCT 


60 


GAACTCACTC 


GCACCCACAC 


CAAATGCGCC 


TCCGCCACCG 


TGGCAGTTGC 


CGTATTGGCA 


120 


ACCCTGTTGT 


CCGCAACGGT 


TCAGGCGAAT 


GCTACCGATG 


AAAACGAAGA 


TGATGAAGAA 


180 


GAGTTAGAAC 


CCGTACAACG 


CTCTGTTTTA 


AGGTGGAGCT 


TCAAATCCGC 


TAAGGAAGGC 


240 


ACTGGAGAAC 


AAGAGGGAAC 


AACAGAGGTA 


ATAAATTTGA 


ACACAGATTC 


ATCAGGAAAT 


300 


GCAGTAGGAA 


GCAGCACAAT 


CACCTTCAAA 


GCCGGCGACA 


ACCTGAAAAT 


CAAACAAAGC 


360 


GGCAATGACT 


TCACCTACTC 


GCTGAAAAAA 


GAGCTGAAAA 


ACCTGACCAG 


TGTTGAAACT 


420 


GAAAAATTAT 


CGTTTGGCGC 


AAACGGCAAT 


AAAGTTGATA 


TTACCAGTGA 


TGCAAATGGC 


480 


TTGAAATTGG 


CGAAAACAGG 


TAACGGAAAT 


GGTCAAAACA 


GTAATGTTCA 


CTTAAACGGT 


540 


ATTGCTTCGA 


CTTTGACCGA 


TACGCTTGCC 


GGTGGCACAA 


CAGGACACGT 


TGACACCAAC 


600 


ATTGATGCGG 


TTAATTATCA 


TCGCGCTGCA 


AGCGTACAAG 


ATGTGTTAAA 


CAGCGGTTGG 


660 


AATATCCAAG 


GCAATGGAAA 


CAATGTCGAT 


TTTGTCCGTA 


CTTACGACAC 


CGTGGACTTT 


720 


GTCAATGGCG 


CGAATGCCAA 


TGTGAGCGTT 


ACGGCTGATA 


CGGCTCACAA 


AAAGACAACT 


780 


GTCCGTGTGG 


ATGTAACAGG 


CTTGCCGGTT 


CAATATGTTA 


CGGAAGACGG 


CAAAACCGTT 


840 


GTGAAAGTGG 


GCAATGAGTA 


TTACAAAGCC 


AAAGATGACG 


GTTCGGCGGA 


TATGAATCAA 


900 


AAAGTCGAAA 


ACGGCGAGCT 


GGCGAAAACC 


AAAGTGAAAT 


TGGTATCGGC 


AAGCGGTACA 


960 


AATCCGGTGA 


AAATTAGCAA 


TGTTGCAGAC 


GGCACGGAAG 


ACACCGATGC 


GGTCAGCTTT 


1020 


AAGCAATTAA 


AAGCCTTGCA 


AGACAAACAG 


GTTACGTTGA 


GCACGAGCAA 


TGCTTATGCC 


1080 


AATGGCGGTA 




CGGCGGCAAG 


GCAACTCAAA 


CTTTAAGCAA 


TGGTTTGAAT 


1140 


TTTAAATTTA 


AATCTAGCGA 


TGGCGAGTTG 


TTGAAAATTA 


GCGCGACCGG 


CGATACGGTT 


1200 


ACTTTTACGC 


CGAAAAAAGG 


TTCGGTACAG 


GTTGGCGATG 


ATGGCAAGGC 


TTCAATTTCA 


1260 


AAAGGTGCAA 


ATACAACTGA 


AGGTTTGGTT 


GAGGCTTCTG 


AATTGGTTGA 


AAGCCTGAAC 


1320 


AAACTGGGTT 


GGAAAGTAGG 


GGTTGAGAAA 


GTCGGCAGCG 


GCGAGCTTGA 


TGGTACATCC 


1380 


AAGGAAACTT 


TAGTGAAGTC 


GGGCGATAAA 


GTAACTTTGA 


AAGCCGGCGA 


CAATCTGAAG 


1440 


GTCAAACAAG 


AGGGCACAAA 


CTTCACTTAC 


GCGCTCAAAG 


ATGAATTGAC 


GGGCGTGAAG 


1500 


AGCGTGGAGT ' 


TTAAAGACAC 


GGCGAATGGT GCAAACGGTG 


CAAGCACGAA 


GATTACCAAA 


1560 
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GACGGCTTGA CCATTACGCT GGCAAACGGT GCGAATGGTG CGACGGTGAC TGATGCCGAC 1620 

AAGATTAAAG TTGCTTCGGA CGGCATTAGC GCGGGTAATA AAGCAGTTAA AAACGTCGCG 1680 

GCAGGCGAAA TTTCTGCCAC TTCCACCGAT GCGATTAACG GAAGCCAGTT GTATGCCGTG 174 0 

GCAAAAGGGG TAACAAACCT TGCTGGACAA GTGAATAATC TTGAGGGCAA AGTGAATAAA 1800 

GTGGGCAAAC GTGCAGATGC AGGTACTGCA AGTGCATTAG CGGCTTCACA GTTACCACAA 1860 

GCCACTATGC CAGGTAAATC AATGGTTTCT ATTGCGGGAA GTAGTTATCA AGGTCAAAAT 1920 

GGTTTAGCTA TCGGGGTATC AAGAATTTCC GATAATGGCA AAGTGATTAT TCGCTTGTCT 1980 

GGCACAACCA ATAGTCAAGG TAAAACAGGC GTTGCAGCAG GTGTTGGTTA CCAGTGG # 2037 
(2) INFORMATION FOR SEQ ID NO: 15: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 679 amino acids 

(B) TYPE: amino acid 

( C ) STRANDEDNESS : unknown 

(D) TOPOLOGY: unknown 

(ii) MOLECULE TYPE: protein 

(Xi) SEQUENCE DESCRIPTION: SEQ ID NO: 15: 

Met Asn Lys lie Phe Asn Val He Trp Asn Val Val Thr Gin Thr Trp 
1 5 10 15 

Val Val Val Ser Glu Leu Thr Arg Thr His Thr Lys Cys Ala Ser Ala 
20 25 30 

Thr Val Ala Val Ala Val Leu Ala Thr Leu Leu Ser Ala Thr Val Gin 
35 40 45 

Ala Asn Ala Thr Asp Glu Asn Glu Asp Asp Glu Glu Glu Leu Glu Pro 
50 55 60 

Val Gin Arg Ser Val Leu Arg Trp Ser Phe Lys Ser Ala Lys Glu Gly 
65 70 75 80 

Thr Gly Glu Gin Glu Gly Thr Thr Glu Val He Asn Leu Asn Thr Asp 
85 90 95 

Ser Ser Gly Asn Ala Val Gly Ser Ser Thr He Thr Phe Lys Ala Gly 
100 105 110 

Asp Asn Leu Lys He Lys Gin Ser Gly Asn Asp Phe Thr Tyr Ser Leu 
115 120 125 

Lys Lys Glu Leu Lys Asn Leu Thr Ser Val Glu Thr Glu Lys Leu Ser 
130 135 140 
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Phe Gly Ala Asn Gly Asn Lys Val Asp lie Thr Ser Asp Ala Asn Gly 
145 150 155 160 

Leu Lys Leu Ala Lys Thr Gly Asn Gly Asn Gly Gin Asn Ser Asn Val 
165 * 170" 175 

His Leu Asn Gly lie Ala Ser Thr Leu Thr Asp Thr Leu Ala Gly Gly 
180 185 190 

Thr Thr Gly His Val Asp Thr Asn He Asp Ala Val Asn Tyr His Arg 
195 200 205 

Ala Ala Ser Val Gin Asp Val Leu Asn Ser Gly Trp Asn He Gin Gly 
210 215 220 

Asn Gly Asn Asn Val Asp Phe Val Arg Thr Tyr Asp Thr Val Asp Phe . 
225 230 235 240 

Val Asn Gly Ala Asn Ala Asn Val Ser Val Thr Ala Asp Thr Ala His 
245 250 255 

Lys Lys Thr Thr Val Arg Val Asp Val Thr Gly Leu Pro Val Gin Tyr 
260 265 270 

Val Thr Glu Asp Gly Lys Thr Val Val Lys Val Gly Asn Glu Tyr Tyr 
275 280 285 

Lys Ala Lys Asp Asp Gly Ser Ala Asp Met Asn Gin Lys Val Glu Asn 
290 295 300 

Gly Glu Leu Ala Lys Thr Lys Val Lys Leu Val Ser Ala Ser Gly Thr 
305 310 315 320 

Asn Pro Val Lys He Ser Asn Val Ala Asp Gly Thr Glu Asp Thr Asp 
325 330 335 

Ala Val Ser Phe Lys Gin Leu Lys Ala Leu Gin Asp Lys Gin Val Thr 
340 345 350 

Leu Ser Thr Ser Asn Ala Tyr Ala Asn Gly Gly Thr Asp Asn Asp Gly 
355 360 365 

Gly Lys Ala Thr Gin Thr Leu Ser Asn Gly Leu Asn Phe Lys Phe Lys 
370 375 380 

Ser Ser Asp Gly Glu Leu Leu Lys He Ser Ala Thr Gly Asp Thr Val 
385 390 395 400 

Thr Phe Thr Pro Lys Lys Gly Ser Val Gin Val Gly Asp Asp Gly Lys 
405 410 415 



Ala Ser lie Ser Lys Gly Ala Asn Thr Thr Glu Gly Leu Val Glu Ala 
420 425 430 



Ser Glu Leu Val Glu Ser Leu Asn Lys Leu Gly Trp Lys Val Gly Val 
435 440 445 



WO 96/30519 



PCTAJS96/04031 



84 



Glu Lys Val Gly Ser Gly Glu Leu Asp Gly Thr Ser Lys Glu Thr Leu 
450 455 460 



Val Lys Ser Gly Asp Lys Val Thr Leu Lys Ala Gly Asp Asn Leu Lys 
465 470 ' 475 480 

Val Lys Gin Glu Gly Thr Asn Phe Thr Tyr Ala Leu Lys Asp Glu Leu 



485 



490 495 



Thr Gly Val Lys Ser Val Glu Phe Lys Asp Thr Ala Asn Gly Ala Asn 
500 505 510 

Gly Ala Ser Thr Lys He Thr Lys Asp Gly Leu Thr He Thr Leu Ala 
515 520 525 

Asn Gly Ala Asn Gly Ala Thr Val Thr Asp Ala Asp Lys He Lys Val 
530 535 540 

Ala Ser Asp Gly He Ser Ala Gly Asn Lys Ala Val Lys Asn Val Ala 
545 550 555 560 

Ala Gly Glu He Ser Ala Thr Ser Thr Asp Ala He Asn Gly Ser Gin 
565 570 575 

Leu Tyr Ala Val Ala Lys Gly Val Thr Asn Leu Ala Gly Gin Val Asn 
580 585 590 

Asn Leu Glu Gly Lys Val Asn Lys Val Gly Lys Arg Ala Asp Ala Gly 
595 600 605 

Thr Ala Ser Ala Leu Ala Ala Ser Gin Leu Pro Gin Ala Thr Met Pro 
610 615 620 

Gly Lys Ser Met: Val Ser lie Ala Gly Ser Ser Tyr Gin Gly Gin Asn 
625 630 635 640 

Gly Leu Ala He Gly Val Ser Arg lie Ser Asp Asn Gly Lys Val lie 
645 650 655 

He Arg Leu Ser Gly Thr Thr Asn Ser Gin Gly Lys Thr Gly Val Ala 
660 665 670 



Ala Gly Val Gly Tyr Gin Trp 
675 



(2) INFORMATION FOR SEQ ID NO: 16: 

<i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 21 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: unknown 

( D ) TOPOLOGY : unknown 

(ii) MOLECULE TYPE: DNA (genomic) 
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(Xi> SEQUENCE DESCRIPTION: SEQ ID NO: 16 
CCGTGCTTGC CCAACACGCT T 
(2) INFORMATION FOR SEQ ID NO: 17: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 21 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: unknown 

(D) TOPOLOGY: unknown 

(ii) MOLECULE TYPE: DNA (genomic) 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 17: 
GCTGCCACCT TGCACAACAA C 
(2) INFORMATION FOR SEQ ID NO: 18:, 

ii) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 21 base pairs 

(B) TYPE: nucleic acid 
(C> STRANDEDNESS: unknown 
(D) TOPOLOGY: unknown 

(ii) MOLECULE TYPE: DNA (genomic) 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 18: 

CTTTCAATGC CAGAAAGTAG G 

(2) INFORMATION FOR SEQ ID NO:19: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 21 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: unknown 

(D) TOPOLOGY: unknown 

(ii) MOLECULE TYPE : DNA (genomic) 
(xi ) SEQUENCE DESCRIPTION: SEQ ID NO: 19: 
CTTCAACCGT TGCGGACAAC A 
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CLAIMS 

We claim: 

1 . A recombinant Haemophilus adhesion protein. 



2. A recombinant Haemophilus adhesion protein according to claim 1 which has 
5 a sequence homologous to that shown in Figure 2. 

3. A recombinant Haemophilus adhesion protein according to claim 1 which has 
a sequence homologous to the amino acid sequence shown in Figure 3. 

4. A recombinant Haemophilus adhesion protein according to claim I which has 
the sequence shown in Figure 2. 

10 5. A recombinant Haemophilus adhesion protein according to claim 1 which has 
the amino acid sequence shown in Figure 3. 

6. A recombinant nucleic acid encoding an Haemophilus adhesion protein. 

7. The nucleic acid of claim 6 comprising DNA having a sequence homologous to 
that shown in Figure 1 . 

15 8. The nucleic acid of claim 6 comprising DNA having a sequence homologous to 
that shown in Figure 3. 

9. The nucleic acid of claim 6 comprising DNA capable of hybridizing to that shown 
in Figure 1 . 

10. The nucleic acid of claim 6 comprising DNA capable of hybridizing to that shown 
20 in Figure 3. 
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1 1 . The nucleic acid of claim 6 comprising DN A having the sequence shown in 
Figure 1 . 

12. The nucleic acid of claim 6 comprising DNA having the sequence shown in 
Figure 3. 

5 13. An expression vector comprising transcriptional and translational regulatory 
nucleic acid operably linked to nucleic acid encoding an Haemophilus adhesion 
protein. 

14. A host cell transformed with an expression vector comprising a nucleic acid 
encoding an Haemophilus adhesion protein. 

10 1 5. A method of producing an Haemophilus adhesion protein comprising: 

a) culturing a host cell transformed with an expressing vector comprising 
a nucleic acid encoding an Haemophilus adhesion protein; and 

b) expressing said nucleic acid to produce an Haemophilus adhesion proteia 

1 6. A vaccine comprising a pharmaceuticallyacceptablecarrierand an Haemophilus 
15 adhesion protein for prophylactic or therapeutic use in generating an immune 

response. 

17. A vaccine according to claim 16 wherein said Haemophilus adhesion protein 
has a sequence homologous to that shown in Figure 2. 

18. A vaccine according to claim 16 wherein said Haemophilus adhesion protein 
20 has a sequence homologous to the amino acid sequence shown in Figure 3. 



1 9. A monoclonal antibody capable of binding to an Haemophilus adhesion proteia 
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20. A method of treating or preventing Haemophilus influenzae infection comprising 
administering the vaccine of claim 16. 

2 1 . A method of treating or preventing a Haemophilus influenzae infection according 
to claim 20 wherein said H. influenzae infection is caused by a non-typable H. 

5 influenzae. 
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AT6AACAAAA TTTTTAACGT TATTTGGAAT GTTGTGACTC AAACTTGGGT TGTCGTATCT 60 

GAACTCACTC GCACCCACAC CAAATGCGCC TCCGCCACCG TGGCGGTTGC CGTATTGGCA 120 

ACCCTGTTGT CCGCAACGGT TGAGGCGAAC AACAATACTC CTGTTACGAA TAAGTTGAAG 180 

GCTTATGGCG ATGCGAATTT TAATTTCACT AATAATTCGA TAGCAGATGC AGAAAAACAA 240 

GTTCAAGAGG CTTATAAAGG TTTATTAAAT CTAAATGAAA AAAATGCGAG TGATAAACTG 300 

TTGGTGGAGG ACAATACTGC GGCGACCGTA GGCAATTTGC GTAAATTGGG CTGGGTATTG 360 

TCTAGCAAAA ACGGCACAAG GAACGAGAAA AGCCAACAAG TCAAACATGC GGATGAAGTG 420 

TTGTTTGAAG GCAAAGGCGG TGTGCAGGTT ACTTCCACCT CTGAAAACGG CAAACACACC 480 

ATTACCTTTG CTTTAGCGAA AGACCTTGGT GTGAAAACTG CGACTGTGAG TGATACCTTA 540 

ACGATTGGCG GTGGTGCTGC TGCAGGTGCT ACAACAACAC CGAAAGTGAA TGTAACTAGT 600 

ACAACTGATG GCTTGAAGTT CGCTAAAGAT GCTGCGGGTG CTAATGGCGA TACTACGGTT 660 

CACTTGAATG GTATTGGTTC AACCTTGACA GACACGCTTG TGGGTTCTCC TGCTACTCAT 720 

ATTGACGGAG GAGATCAAAG TACGCATTAC ACTCGTGCAG CAAGTATCAA GGATGTCTTG 780 

AATGCGGGTT GGAATATCAA GGGTGTTAAA GCTGGCTCAA CAACTGGTCA ATCAGAAAAT 840 

GTCGATTTTG TTCATACTTA CGATACTGTT GAGTTCTTGA GTGCGGATAC AGAGACCACG 900 

ACTGTTACTG TAGATAGCAA AGAAAACGGT AAGAGAACCG AAGTTAAAAT CGGTGCGAAG 960 

ACTTCTGTTA TCAAAGAAAA AGACGGTAAG TTATTTACTG GAAAAGCTAA CAAAGAGACA 1020 

AATAAAGTTG ATGGTGCTAA CGCGACTGAA GATGCAGACG AAGGCAAAGG CTTAGTGACT 10 80 

GCGAAAGATG TGATTGACGC AGTGAATAAG ACTGGTTGGA GAATTAAAAC AACCGATGCT 1140 

AATGGTCAAA ATGGCGACTT CGCAACTGTT GCATCAGGCA CAAATGTAAC CTTTGCTAGT 1200 

GGTAATGGTA CAACTGCGAC TGTAACTAAT GGCACCGATG GTATTACCGT TAAGTATGAT 1260 

GCGAAAGTTG GCGACGGCTT AAAACTAGAT GGCGATAAAA TCGCTGCAGA TACGACCGCA 1320 
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CTTACTGTGA ATGATGGTAA GAACGCTAAT AATCCGAAAG GTAAAGTGGC TGATGTTGCT 
TCAACTGACG AGAAGAAATT GGTTACAGCA AAAGGTTTAG TAACAGCCTT AAACAGTCTA 
AGCTGGACTA CAACTGCTGC TGAGGCGGAC GGTGGTACGC TTGATGGAAA TGCAAGTGAG 
CAAGAAGTTA AAGCGGGCGA TAAAGTAACC TTTAAAGCAG GCAAGAACTT AAAAGTGAAA 
CAAGAGGGTG CGAACTTTAC TTATTCACTG CAAGATGCTT TAACAGGCTT AACGAGCATT 
ACTTTAGGTA CAGGAAATAA TGGTGCGAAA ACTGAAATCA ACAAAGACGG CTTAACCATC 
ACACCAGCAA ATGGTGCGGG TGCAAATAAT GCAAACACCA TCAGCGTAAC CAAAGACGGC 
ATTAGTGCGG GCGGTCAGTC GGTTAAAAAC GTTGTGAGCG GACTGAAGAA ATTTGGTGAT 
GCGAATTTCG ATCCGCTGAC TAGCTCCGCC GACAACTTAA CGAAACAAAA TGACGATGCC 
TATAAAGGCT TGACCAATTT GGATGAAAAA GGTACAGACA AGCAAACTCC AGTTGTTGCC 
GACAATACCG CCGCAACCGT GGGCGATTTG CGCGGCTTGG GCTGGGTCAT TTCTGCGGAC 
AAAACCACAG GCGGCTCAAC GGAATATCAC GATCAAGTTC GGAATGCGAA CGAAGTGAAA 
TTCAAAAGCG GCAACGGTAT CAATGTTTCC GGTAAAACGG TCAACGGTAG GCGTGAAATT 
ACTTTTGAAT TGGCTAAAGG TGAAGTGGTT AAATCGAATG AATTTACCGT CAAAGAAACC 
AATGGAAAGG AAACGAGCCT GGTTAAAGTT GGCGATAAAT ATTACAGCAA AGAGGATATT 
GACTTAACAA CAGGTCAGCC TAAATTAAAA GATGGCAATA CAGTTGCTGC GAAATATCAA 
GATAAAGGTG GCAAAGTCGT TTCTGTAACG GATAATACTG AAGCTACCAT AACCAACAAA 
GGTTCTGGCT ATGTAACAGG TAACCAAGTG GCAGATGCGA TTGCGAAATC AGGCTTTGAG 
CTTGGCTTGG CTGATGAAGC TGATGCGAAA CGGGCGTTTG ATGATAAGAC AAAAGCCTTA 
TCTGCTGGTA CAACGGAAAT TGTAAATGCC CACGATAAAG TCCGTTTTGC TAATGGTTTA 
AATACCAAAG TGAGCGCGGC AACGGTGGAA AGCACCGATG CAAACGGCGA TAAAGTGACC 
ACAACCTTTG TGAAAACCGA TGTGGAATTG CCTTTAACGC AAATCTACAA TACCGATGCA 
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AACGGTAAGA AAATCACTAA AGTT6TCAAA GATGGGCAAA CTAAATGGTA TGAACTGAAT 
GCTGACGGTA CGGCTGATAT GACCAAAGAA GTTACCCTCG GTAACGTGGA TTCAGACGGC 
AAGAAAGTTG TGAAAGACAA CGATGGCAAG TGGTATCACG CCAAAGCTGA CGGTACTGCG 
GATAAAACCA AAGGCGAAGT GAGCAATGAT AAAGTTTCTA CCGATGAAAA ACACGTTGTC 
AGCCTTGATC CAAATGATCA ATCAAAAGGT AAAGGTGTCG TGATTGACAA TGTGGCTAAT. 
GGCGATATTT CTGCCACTTC CACCGATGCG ATTAACGGAA GTCAGTTGTA TGCTGTGGCA 
AAAGGGGTAA CAAACCTTGC TGGACAAGTG AATAATCTTG AGGGCAAAGT GAATAAAGTG 
GGCAAACGTG CAGATGCAGG TACAGCAAGT GCATTAGCGG CTTCACAGTT ACCACAAGCC 
ACTATGCCAG GTAAATCAAT GGTTGCTATT GCGGGAAGTA GTTATCAAGG TCAAAATGGT 
TTAGCTATCG GGGTATCAAG AATTTCCGAT AATGGCAAAG TGATTATTCG CTTGTCAGGC 
ACAACCAATA GTCAAGGTAA AACAGGCGTT GCAGCAGGTG TTGGTTACCA GTGG 

FIG..1C 
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Met Asn Lys lie Phe Asn Val lie Tip Asn Val Val Thr Gin Thr Trp 
1 5 10 15 

Val Val Val Ser Glu Leu Thr Arg Thr His Thr Lys Cys Ala Ser Ala 
20 25 30 

Thr Val Ala Val Ala Val Leu Ala Thr Leu Leu Ser Ala Thr Val Glu 
35 40 45 

Ala Asn Asn Asn Thr Pro Val Thr Asn Lys Leu Lys Ala Tyr Gly Asp 
50 55 60 

Ala Asn Phe Asn Phe Thr Asn Asn Ser lie Ala -Asp Ala Glu Lys Gin 
65 70 75 . .'80.' 

Val Gin Glu Ala Tyr Lys Gly Leu Leu Asn Leu Asn Glu Lys Asn Ala 

85 90 95 

Ser Asp Lys Leu Leu Val Glu Asp Asn Thr Ala Ala Thr Val Gly Asn 
100 105 no 

Leu Arg Lys Leu Gly Trp Val Leu Ser Ser Lys Asn Gly Thr Arg Asn 
115 120 125 

Glu Lys Ser Gin Gin Val Lys His Ala Asp Glu Val Leu Phe Glu Gly 
130 135 140 

Lys Gly Gly Val Gin Val Thr Ser Thr Ser Glu Asn Gly Lys His Thr 
145 150 155 160 

He Thr Phe Ala Leu Ala Lys Asp Leu Gly Val Lys Thr Ala Thr Val 

165 170 175 

Ser Asp Thr Leu Thr He Gly Gly Gly Ala Ala Ala Gly Ala Thr Thr 
160 185 190 

Thr Pro Lys Val Asn Val Thr Ser Thr Thr Asp Gly Leu Lys Phe Ala 
195 200 205 

Lys Asp Ala Ala Gly Ala Asn Gly Asp Thr Thr Val His Leu Asn Gly 
210 215 220 

He Gly Ser Thr Leu Thr Asp Thr Leu Val Gly Ser Pro Ala Thr His 
225 230 235 240 

He Asp Gly Gly Asp Gin Ser Thr His Tyr Thr Arg Ala Ala Ser He 

245 250 255 

Lys Asp Val Leu Asn Ala Gly Trp Asn He Lys Gly Val Lys Ala Gly 
260 265 270 

Ser Thr Thr Gly Gin Ser Glu Asn Val Asp Phe Val His Thr Tyr Abp 
275 280 285 
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Thr Val Glu Phe Leu Ser Ala Asp Thr Glu Thr Thr Thr Val Thr Val 
290 295 300 

Asp Ser Lys Glu Aen Gly Lys Arg Thr Glu Val Lys He Gly Ala Lys 
305 310 315 320 

Thr Ser Val He Lys Glu Lys Asp Gly Lys Leu Phe Thr Gly Lys Ala 

325 330 335 

Asn Lys Glu Thr Asn Lys Val Asp Gly Ala Asn Ala Thr Glu Asp Ala 
340 345 350 

Asp Glu Gly Lys Gly Leu Val Thr Ala Lys Asp Val He Asp Ala Val 
355 360 365 

Asn Lys Thr Gly Trp Arg He Lys Thr Thr Asp Ala Asn Gly Gin Asn 
370 375 380 

Gly Asp Phe Ala Thr Val Ala Ser Gly Thr Asn Val Thr Phe Ala Ser 
385 390 395 400 

Gly Asn Gly Thr Thr Ala Thr Val Thr Asn Gly Thr Asp Gly He Thr 
405 410 415 

Val Lys Tyr Asp Ala Lys Val Gly Asp Gly Leu Lys Leu Asp Gly Asp 
420 425 430 

Lys He Ala Ala Asp Thr Thr Ala Leu Thr Val Asn Asp Gly Lys Asn 
435 440 445 

Ala Asn Asn Pro Lys Gly Lys Val Ala Asp Val Ala Ser Thr Asp Glu 
450 455 460 

Lys Lys Leu Val Thr Ala Lys Gly Leu Val Thr Ala Leu Asn Ser Leu 
4&5 470 475 480 

Ser Trp Thr Thr Thr Ala Ala Glu Ala Asp Gly Gly Thr Leu Asp Gly 

485 490 495 

Asn Ala Ser Glu Gin Glu Val Lys Ala Gly Asp Lys Val Thr Phe Lys 
500 505 510 

Ala Gly Lys Asn Leu Lys Val Lys Gin Glu Gly Ala Asn Phe Thr Tyr 
515 520 525 

Ser Leu Gin Asp Ala Leu Thr Gly Leu Thr Ser He Thr Leu Gly Thr 
530 535 540 

Gly Asn Asn Gly Ala Lys Thr Glu He Asn Lys Asp Gly Leu Thr He 
545 550 555 560 

Thr Pro Ala Asn Gly Ala Gly Ala Asn Asn Ala Asn Thr He Ser Val 

565 570 575 

Thr Lys Asp Gly He Ser Ala Gly Gly Gin Ser Val Lys Asn Val Val 
580 585 590 
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Ser Gly Leu Lys Lys Phe Gly Asp Ala Asn Phe Asp Pro Leu Thr Ser 



600 



605 



Ser Ala Asp Asn Leu Thr Lys Gin Asn Asp Asp Ala Tyr Lys Gly Leu 



615 620 



Thr Asn Leu Asp Glu Lys Gly Thr Asp Lys Gin Thr Pro Val Val Ala 



635 



640 



A.P Asn Thr Ala Ala Thr Val Gly Asp Leu at, Gly Leu Gly Trp Val 



650 



655 



He ser Ala Asp Lys Thr Thr Gly Gly Ser Thr Glu Tyr Hi. Asp Gin 



6 « 670 



V.1 Ar B A.„ Ala A.n Glu val Lys Ph. Ly. Ser Gly A.n Gly He A.n 

680 685 
val ser Gly Ly. Thr Val A.n Gly Arg Ar 9 Glu lie Thr Phe Glu Leu 

695 700 
Ala Ly. Gly Glu Val Val Lys Ser A.n Glu Ph. Thr Val Ly. Glu Thr 

A.n Gly Lya Glu Thr Ser Leu Val Ly. Val Gly A.p Ly. Tyr Tyr^r 

730 735 
Lys Glu Asp lie Aap Leu Thr Thr Gly G!n Pro Ly. Leu Ly. A.p Gly 

745 750 

*» Thr val Ala Ala Ly. Tyr Gin A.p Ly. Gly Gly Ly. Val Val Ser 

760 7g5 

val Thr Asp A.n Thr Glu Ala Thr lie Thr Asn Ly. Gly Ser Gly Tyr 

775 780 

ill 01 ° £J A1 » *=P «° "e Ala Lys Ser Gly Phe G1 u 

Leu Gly Leu Ala Asp Glu Ala Asp Ala Lys Ar 9 Ala Phe Asp Asp Lys 

810 g 15 

Thr Lys Ala Leu Ser Ala Gly Thr Thr Glu He Val Asn Ala His Asp 

825 83 Q 

Lys Val Arg Phe Ala Asn Gly Leu Asn Thr Lys Val Ser Ala Ala Thr 

840 g 45 

Val Glu Ser Thr Asp Ala Asn Gly Asp Lys Val Thr Thr Thr Phe Val 

B55 860 
Lys Thr Asp Val Glu Leu Pro Leu Thr Gin lie Tyr Asn Thr Asp Ala 

87 5 880 



Asn Gly Lys Lys lie Thr Lys Val Val Lys Asp Gly Gin Thr Lys Trp 
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Tyr Glu Leu Asn Ala Asp Gly Thr Ala Asp Met Thr Lys Glu Val Thr 
900 905 g 10 

Leu Gly Asn Val Asp Ser Asp Gly Lys Lys Val Val Lys Asp Asn Asp 
915 920 925 

Gly Lys Trp Tyr His Ala Lys Ala Asp Gly Thr Ala Asp Lys Thr Lys 

Gly Glu Val Ser Asn Asp Lys Val Ser Thr Asp Glu Lys His Val Val 
945 950 955 960 

Ser Leu Asp Pro Asn Asp Gin Ser Lys Gly Lys Gly Val Val He Asp 

965 970 g75 

Asn Val Ala Asn Gly Asp He Ser Ala Thr Ser Thr Asp Ala He Asn 
980 gas 990 

Gly Ser Gin Leu Tyr Ala Val Ala Lys Gly Val Thr Asn Leu Ala Gly 
"5 1000 1005 

Gln TnJn A8n ABn LeU G1U Gly LyS Val Asn L *» Val G1 y Lye Arg Ala 
1010 1015 1020 

Asp Ala Gly Thr Ala Ser Ala Leu Ala Ala Ser Gin Leu Pro Gin Ala 
1025 1030 1035 1040 

Thr Met Pro Gly Lys Ser Met Val Ala He Ala Gly Ser Ser Tyr Gin 

1Q 45 1050 1055 

Gly Gin Asn Gly Leu Ala He Gly Val Ser Arg He Ser Asp Asn Gly 
1Q 60 1065 1070 

Lys Val He He Arg Leu Ser Gly Thr Thr Asn Ser Gin Gly Lys Thr 
10 ?5 1080 1085 

Gly Val Ala Ala Gly Val Gly Tyr Gin Trp 
1090 1095 
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• ... . 

• * • • 

61 AAATATCACTTTTTTATTCTCCAAATATAGAATAGAATACGCACGATTTCACTAAGAAAA 12 0 
♦•*••« 
121 GTATATTTATCATTAATOTTATTAAATA TAAGGTAAA 180 

M N K I F N 

181 GTTATTTGGAATGTTATGACTCAAACTTGGGTTGTCGTATCTGAACTCACTCGCACCCAC 240 
VIWNVMTQTWVVVSELTRTH 

241 ACCAAACGCGCCTCCGCAACCGTGGAGACCGCCGTATTGGCGACACTGTTGTTTGCAACG 300 
T K RA S ATVE T AVLAT L L FAT 

• • • . 

301 GTTCAGGCGAATGCTACCGATGAAGATGAAGAGTTAGACCCCGTAGTACGCACTGCTCCC 360 
VQANATD E DEELD PVVRTAP 

361 GTGTTGAGCTTCCATTCCGATAAAGAAGGCACGGG AGAAAAAGAAGTTACAGAAAATTCA 420 
VLSFHSDKEGTG EK EV TE NS 

• • • « • « 

421 AATTGGGGAATATATTTCGAC AATAAAGGAGTACTAAAAGCCGGAGCAATCACCCTCAAA 480 
NWGIYFDKKGVLKAGAITLK 

481 GCCGGCGACAACCTGAAAATCAAACAAAACACCGATGAAAGCACCAATGCCAGTAGCTTC 540 
AGDNLKIKQNTDESTN A SSF 

541 ACCTACTCGCTGAAAAAAGACCTCACAGATCTGACCAGTGTTGCAACTGAAAAATTATCG 600 
TYSLKKDLTDLTSVATEKL S 

• • • • • # 

601 TTTGGCGCAAACGGCGATAAAGTTGATATTACCAGTGATGCAAATGGCTTGAAATTGGCG 660 
FGANGDK VD ITS DAN G LKLA 

661 AAAACAGGTAACGGAAATGTTCATTTGAATGGTTTGGATTCAACTTTGCCTGATGCGGTA 720 
KTGNGNVHLNGLDSTLPDAV 

721 ACGAATACAGGTGTGTTAAGTTCATCAAGTTTTACACCTAATGATGTTGAAAAAACAAGA 780 
TNTGVLSSSS'FTPNDVEKTR 

• • • * * • 

781 GCTGCAACTGTTAAAGATGTTTTAAATGCAGGTTGGAACATTAAAGGTGCTAAAACTGCT 840 
AATVKDVLNAGWNIKGAKTA 

• • • • * . 

841 GGAGGTAATGTTGAGAGTGTTGATTTAGTGTCCGCTTATAATAATGTTGAATTTATTACA 900 
GGNVESVDLVSAYNNVEFXT 

• • • • * . 

901 GGCGATAAAAACACGCTTGATGTTGTATTAACAGCTAAAGAAAACGGTAAAACAACCGAA 960 
GDKNTLDVVLTAKENGKTTE 

961 GTGAAATTCACACCGAAAACCTCTGTTATCAAAGAAAAAGACGGTAAGTTATTTACTGGA 1020 
VKFTPKTSVIKEKDGKLFTG 

• • . . • . 

1021 AAAG AGAATAACGACACAAATAAAGTTACAAGTAACACGGCGACTGATAATACAGATGAG 1080 
KENNDTNKVTSNTATDNTDE 

1081 GGTAATGGCTTAGTCACTGCAAAAGCTGTGATTGATGCTGTGAACAAGGCTGGTTGGAGA 1140 
GNGLVTAKAVIDAVNKAGWR 
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1141 GTTAAAACAACTACTGCTAATGGTCAAAATGGCGACTTCGCAACTGTTGC6TCAGGCACA 1200 
VKTTTANGQNGDFATVASGT 

1201 AATGTAACCTTTGAAAGTGGCGATGGTACAACAGCGTCAGTAACTAAAGATACTAACGGC 1260 
NVTFESGDGTTASVTKDTNG 

1261 AATGGCATCACTGTTAAGTACGACGCGAAAGTTGGCGACGGCTTGAAATTTGATAGCGAT 1320 
NGITVKYDAKVGDGLKFDSD 

1321 AAAAAAATCGTTGCAGATACGACCGCACTTACTGTGACAGGTGGTAAGGTAGCTGAAATT 1380 
KKIVADTTALTVTGGKVAEI 

1381 GCTAAAGAAGATGACAAGAAAAAACTTGTTAATGCAGGCGATTTGGTAACAGCTTTAGGT 1440 
AKEDDKKKLVNAGDLVTALG* 
* 

1441 AATCTAAGTTGGAAAGCAAAAGCTGAGGCTGATACTGATGGTGCGCTTGAGGGGATTTCA 1500 
NLSWKAKAEADTDGALEGIS 

1501 AAAGACCAAGAAGTCAAAGCAGGCGAAACGGTAACCTTTAAAGCGGGCAAGAACTTAAAA 1560 
KDQEVKAGETVTFKAGKNLK 
• 

1561 GTGAAACAGGATGGTGCGAACTTTACTTATTCACTGCAAGATGCTTTAACGGGTTTAACG 1620 
VKQDGANFTYSLQDALTGLT 
• 

1621 AGCATTACTTTAGGTGGTACAACTAATGGCGGAAATGATGCGAAAACCGTCATCAACAAA 1680 
SITLGGTTNGGNDAKTVINK 

1681 GACGGTTTAACCATCACGCCAGCAGGTAATGGCGGTACGACAGGTACAAACACCATCAGC 1740 
DGLTITPAGNGGTTGTNTIS 
• 

1741 GTAACC AAAGATGGCATTAAAGC AGGTAATAAAGCTATTACTAATGTTGCGAGTGGTTTA 1800 
VTKDGIKAGNKAITNVASGL 

1801 AGAGCTTATGACGATGCGAATTTTGATGTTTTAAATAACTCTGCAACTGATTTAAATAGA 1860 
RAYDDANFDVLNNSATDLNR 

1861 CACGTTGAAGATGCTTATAAAGGTTTATTAAATCTAAATGAAAAAAATGCAAATAAAC AA 1920 
HVEDAYKGLLNLNEKNANKQ 

• • * • • . 

1921 CCGTTGGTGACTGACAGCACGGCGGCGACTGTAGGCGATTTACGTAAATTGGGTTGGGTA 1980 
PLVTDSTAATVGDLRKLGWV 

• • * • • . 

1981 GTATCAACCAAAAACGGTACGAAAGAAGAAAGCAATCAAGTTAAACAAGCTGATGAAGTC 2040 
VSTKNGTKEESNQVKQADEV 
• 

2041 CTCTTTACCGGAGCCGGTGCTGCTACGGTTACTTCCAAATCTGAAAACGGTAAACATACG 2100 
LFTGAGAATVTSKSENGKHT 

*••... 

2101 ATTACCGTTAGTGTGGCTGAAACTAAAGCGGATTGCGGTCTTGAAAAAGATGGCGATACT 2160 
I TVSVAETKADCGLEKDGDT 

• • • • • . 

2161 ATTAAGCTCAAAGTGGATAATC AAAACACTGATAATGTTTTAACTGTTGGTAATAATGGT 2220 
IKLKVDNQNTDNVLTVGNNG 

**.... 

2221 ACTGCTGTCACTAAAGGTGGCTTTGAAACTGTTAAAACTGGAGCGACTGATGCAGATCGC 2280 
TAVTKGGFETVKTGATDADR 
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2281 GGTAAAGTAACTGTAAAAGATGCTACTGCTAATGAC6CTGATAAGAAAGTCGCAACTGTA 2340 
GKVTVKDATAND ADKRVATV 

• • • • • • 

2341 AAAGATGTTGCAACCGCAATTAATAGTGCGGCGACTTTTGTGAAAACAGAGAATTTAACT 2400 
KDVATAINSAATFVKTENLT 

• • • • • 

2401 ACCTCTATTGATGAAGATAATCCTACAGATAACGGCAAAGATGACGCACTTAAAGCGGGC 2460 
TSIDEDNPTDNGKDDA LKAG 

• . * • , • • • 

2461 GATACCTTAACCTTTAAAGCAGGTAAAAACCTGAAAGTTAAACGTGATGGAAAAAATATT 2520 
DTL TFKAGKNLRVKRDGKNI 

2521 ACTOTTGACTTGGCGAAAAACCTTGAGGTGAAAACTGCGAAAGTGAGTGATACTTTAACG 2580 
TFDLAKNLE VKTAKVSDT LT 

• • ♦ • ♦ • 

2581 ATTGGCGGGAATACACCTAGAGGTGGCACTACTGCGACGCCAAAAGTGAATATTACTAGC 2640 
IGGNTPTGGT TATPKVNITS 

, • ♦ • • • • 

2641 ACGGCTGATGGTTTGAATTTTGCAAAAGAAACAGCCGATGCCTCGGGTTCTAAGAATGTT 2700 
TADGLNF AKET ADASGSKNV 

• » • • • # 

2701 TATTTGAAAGGTATTGCGACAACTTTAACTGAGCCAAGCGCGGGAGCGAAGTCTTC ACAC 2 760 
YLKGIAT TLTEPSAGAKSSH 

• »•••• 

2761 GTTGATTTAAATGTGGATGCGACGAAAAAATCCAATGCAGCAAGTATTGAAGATGTATTG 2 820 
VDLNVDATKKSNAAS IEDVL 

• • • . • . 

2821 CGCGCAGGTTGGAATATTCAAGGTAATGGTAATAATGTTGATTATGTAGCGACGTATGAC 2880 
RAGWNIQGNGNNVDYVATYD 

• • • • • • 

2881 ACAGTAAACTTTACCGATGACAGCACAGGTACAACAACGGTAACCGTAACCCAAAAAGCA 2940 
TVNFTDDSTGTTTVTVTQKA 

*••••• 

2941 GATGGCAAAGGTGCTGACGTTAAAATCGGTGCGAAAACTTCTGTTATCAAAGACCACAAC 3000 
DGKGADVKIGAKTSVIKDHN 

• * • ♦ • • 

3001 GGCAAACTGTTTACAGGCAAAGACCTGAAAGATGCGAATAATGGTGCAACCGTTAGTGAA 3060 
GKLFTGKDLKDANNGATVSE 

3061 GATGATGGCAAAGACACCGGCACAGGCTTAGTTACTGCAAAAACTGTGATTGATGCAGTA 3120 
DDGKD TGTGLVTAKTVIDAV 

3121 AATAAAAGCGGTTGGAGGGTAACCGGTGAGGGCGCGACTGCCGAAACCGGTGCAACCGCC 3180 
NKSGWRVTGEGATAETGATA 

3181 GTGAATGCGGGTAACGCTGAAACCGTTACATCAGGCACGAGCGTGAACTTCAAAAACGGC 3240 
VNAGNAETVTSGTSVNFKNG 

• * • * • • 

3241 AATGCGACCACAGCGACCGTAAGCAAAGATAATGGCAACATCAATGTCAAATACGATGTA 3300 
NATTATVSKDNGNINVKYDV 

3301 AATGTTGGTGACGGCTTGAAGATTGGCGATGAC AAAAAAATCGTTGCAGACACGACCAC A 3360 
NVGDGLKIGDDKKIVADTTT 

3361 CTTACTGTAACAGGTGGTAAGGTGTCTGTTCCTGCTGGTGCT AATAGTGTTAATAACAAT 3420 
LTVTGGKVSVPAGANSVNNN 
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3421 AAGAAACTT6TTAAT6CAGAG6GTTTAGC6ACT6CTTTAAACAACCTAAGCTG6AC6GCA 3480 
KKLVNAEGLA TALNNLSWTA 

3481 AAAGCCGATAAATATGCAGATGGCGAGTCAGAGGGCGAAACCGACCAAGAAGTCAAAGCA 3540 
KADKYADGESEGETDQEVKA 

3541 GGCGACAAAGTAACCTTTAAAGCAG^ 3600 

GDKVTFKAGKNLKVKQSEKD 

3601 TTTACTTATTCACTGCAAGACACTTTAACAGGCTTAACGAGCATTACTTOAGGTGGTACA 3660 
FTYSLQDTLTGLTSITLGGT 

3661 GCTAATGGCAGAAATGATACGGGAACCGTCATCAACAAAGACGGCTTAACCATCACGCTG 3720 
ANGRNDTGTVINKDGLTITL 

3721 GCAAATGGTGCTGCGGCAGGCACAGATGCGTCTAACGGAAACACCATGAGTGTAACCAAA 3780 
ANGAAAGTDASNGNT I SVTK 

3781 GACTCCATTAGTGCGGGTAATAAAGAAATTACCAATGTTAAGAGTGCTTTAAAAACCTAT 3840 
DGISAGNKEITNVKSALKTY 

3841 JUfflnCR^^ 

KDTQNTADETQDKEFHAAVK 

3901 AACGCAAATGAAGTTGAGTOCGTGGGTAAAAACGGTGCAACCGTGTCTGC^^ 3960 
NANEVEPVGKNGATVSAKTD 

3961 AACAACGGAWttCAra^ 4020 
NNGKHTVTIDVAEAKVGDGL 

4021 GAAAAAGATACTGACGGCAAGATTAAACTCAAAGTAGATAATACAGATGGGAATAATCTA 4080 
EKDTDGKIKLKVDKTDGNKL 

4081 ^^^^^®*^®^^CAA^GGTGCATCCGTTGCCAAGGGCGAGTTTAATGCCGTAACA 4140 
LTVDATKGASVAKGEFNAVT 

4141 ACAGATGCAACTACAGCCCAAGGCACAAATGCCAATGAGCGCGGTAAAGTGGTTGTCAAG 4200 
TDATTAQGTNANERGKVVVK 

4201 G^TTCAAATGGTGCAACTGCTACCGAAACTGACAAGAAAAAAGTGGCAACTGTTGGCGAC 4260 
GSNGATATETDKKKVATVGD 

4261 GTTGCTAAAGCGATTAACGACGCAGCAACTTTCGTGAAAGTGGAAAATGACGACAGTGCT 4320 
VAKAINDAATFVKVENDDSA 

4321 ACGATTGATGATAGCCCAACAGATGATGGCGCAAATGATGCTCTCAAAGCAGGCGACACC 4380 
TIDDSPTDDGANDALKAGDT 

4381 TTGACCTTAAAAGCGGGTAAAAACTTAAAAGTTAAACGTGATGGTAAAAATATTACTTTT 4440 
LTLKAGKNLKVKRD GK NI T F 

4441 GCCCTTGCGAACGACCTTAGTGTAAAAAGCGCAACCGTTAGCGATAAATTATCGCTTGGT 450O 
ALANDLSVKSATVSDKLSLG 

4501 ACAAACGGCAATAAAGTCAATATCACAAGCGACACCAAAGGCTTGAACTTCGCTAAAGAT 4560 
TMGNKVNITSDTKGLNFAKD 
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* • ♦ 

4561 AGTAAGACAGGCGATGATGCTAATATTCACTTAAATGGCATTGCTTCAACTTTAACTGAT 4620 
SKTGDDANIHLNGIASTLTD 

• • • • • # 

4621 ACATTGTTAAATAGTGGTGCGACAACCAATTTAGGTGGTAATGGTATTACTGATAACGAG 4680 
TLLNSGATTNLGGNGITDNE 

•*•••* 

4681 AAAAAACGCGCGGCGAGCGTTAAAGATGTCTTGAATGCGGGTTGGAATGTTCGTGGTGTT 4740 
KKRAASVKDVLNAGWNVRGV 

• • » • • . 

4741 AAACCGGCATCTGCAAATAATCAAGTGGAGAATATCGACTTTGTAGCAACCTACGACACA 4800 
KPASANNQVENIDFVATYDT 

► ♦ - • . . 

4801 GTGGACTTTGTTAGTGGAGATAAAGACACCACGAGTGTAACTGTTGAAAGTAAAGATAAT 4860 
VDFVSGDKDTTSVTVESKDN 

• • • » * . 

4861 GGC AAGAGAACCGAAGTTAAAATCGGTGCGAAGACTTCTGTTATCAAAGACCACAACGGC 4920 
GKRTEVKIGAKTSV I K D H N G 

4921 AAACTGTTTACAGGCAAAGAGCTGAAGGATGCTAACAATAATGGCGTAACTGTTACCGAA 4980 
KLFTGKELKDANNNGVTVTE 

4981 ACCGACGGCAAAGACGAGGGTAATGGTTTAGTGACTGCAAAAGCTGTGATTGATGCCGTG 5040 
TDGKDEGNGLVTAKAVIDAV 

• . • . 

5041 AATAAGGCTGGTTGGAGAGTTAAAACAACAGGTGCTAATGGTCAGAATGATGACTTCGCA 5100 
NKAGWRVKTTGA NGQ NDDFA 

5101 ACTGTTGCGTCAGGC ACAAATGTAACCTTTGCTGATGGTAATGGCACAACTGCCGAAGTA 5160 
TVASGTNVTFADGNGTTAEV 

5161 ACTAAAGCAAACGACGGTAGTATTACTGTTAAATACAATGTTAAAGTGGCTGATGGCTTA 5220 
TKANDGS I TVKYNVKVADGL 

5221 AAACTAGACGGCGATAAAATCGTTGCAGACACGACCGTACTTACTGTGGCAGATGGTAAA 5280 
KLDGDKIVADTTVLTVADGK 

5281 GTTACAGCTCCGAATAATGGCGATGGTAAGAAATTTGTTGATGCAAGTGGTTTAGCGGAT 5340 
VTAPNNGDGKKFVDASGLAD 

5341 GCGTTAAATAAATTAAGCTGGACGGCAACTGCTGGTAAAGAAGGCACTGGTGAAGTTGAT 5400 
ALNKLSWTATAGKEGTGEVD 

• • ► • • , 

54 01 CCTGCAAATTCAGCAGGGCAAGAAGTCAAAGCGGGCGACAAAGTAACCTTTAAAGCCGGC 5460 
PANSAGQEVKAGDKVTFKAG 

5461 GACAACCTGAAAATCAAACAAAGCGGCAAAGACTTTACCTACTCGCTGAAAAAAGAGCTG 5520 
DNLKIKQSGKDFTYSLKKEL 

• ••••• 

5521 AAAGACCTGACCAGCGTAGAGTTCAAAGACGCAAACGGCGGTACAGGCAGTGAAAGCACC 5580 
KDLTSVEFKDANGGTGSEST 

• • • a • a 

5581 AAGATTACCAAAGACGGCTTGACCATTACGCCGGCAAACGGTGCGGGTGCGGCAGGTGCA 5640 
KITKDGLTITPANGAGAAGA 

»•••■. 

5641 AACACTGCAAACACCATTAGCGTAACCAAAGATGGCATTAGCGCGGGTAATAAAGCAGTT 57O0 
NTANTI SVTKDGI SAGNKAV 



L 
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5701 ACAAACGTTGTGAGCGGACTGAAGAAATTTGGTGATGGTCATACGTTGGCAAATGGCACT 5760 
TNVVSGLKKFGDGHTLANGT 

• * • • " 

5761 GTTGCTGATTTTGAAAAGCATTATGACAATGCCTATAAAGACTTGACCAATTTGGATGAA 5820 
VADPEKHYDNAYKDLTNLDE 

• • # 

5821 AAAGGCGCGGATAATAATCCGACTGTTGCCGACAATACCGCTGCAACCGTGGGCGATTTG 5880 
KGADNNPTVADNTAATVGDL 

5881 CGCGGCTTGGGCTGGGTCATTTCTGCGGACAAAACCACAGGCGAACCCAATCAGGAATAC 5940 
RGLGWVI SADKTTGEPNQEY 

5941 AACGCGCAAGTGCGTAACGCCAATGAAGTGAAATTCAAGAGCGGCAACGGTATCAATGTT 6000 
NAQVRNANEVKPKSGNG1NV 

6001 TCCGGTAAAACATTGAACGGTACGCGCGTGATTACCTTTGAATTGGCTAAAGGCGAAGTG 6060 
SGKTLNGTRVI TPE LAKGEV 

6061 GTTAAATCGAATGAATTTACCGTTAAGAATGCCGATGGraCGGAAACGAACTTGGTTAAA 6120 
VKSNEFTVKNADGSETNLVK 

6121 GTTGGCGATATGTATTACAGCAAAGAGGATATTGACCCGGCAACCAGTWc^^ 6180 
VGDMYYSKED1DPATSKPMT 

• • w 

6181 GGTAAAACTGAAAAATATAAGGTTGAAAACGGCAAAGTCGTTTCTGCTAACGGCAGCAAG 6240 
6KTEKYKVENGKVVSANGSK 

6241 ACCGAAGTTACCCTAACCAACAAAGGTTCCGGCTATGTAACAGGTAACCAAGTGGCTGAT 6300 
TEVTLTNKGSGYVTGNQVAD 

6301 GCGATTGCGAAATCAGGCTTTGAGCTTGGTTTGGCTGATGCGGCAGAAGCTGAAAAAGCC 6360 
AIAKSGFELG LADAAEAEKA 

6361 TTTGCAGAAAGCGCAAAAGACAAGCAATTGTCTAAAGATAAAGCGGAAACTGTAAATGCC 6420 
FAESAKDKQLSKDKAETVNA 

6421 ^CGATAAAGTCCGTTTTGCTAATGGTTTAAATACCAAAGTGAGCGCGGCAACGGTGGAA 6480 
HDKVRFANGLNTKVSAATVE 

• ♦ . 

6481 AGCACTGATGCAAACGGCGATAAAGTGACCACAACCTTTGTGAAAACCGATGTGGAATTG 6540 
STDANGDKVTTTFVKTDVEL 

6541 CCTTTAACGCAAATCTACAATACCGATGCAAACGGTAATAAGATCGTTAAAAAAGCTGAC 6600 
PLTQIYNTDANGNKIVKKAD 

6601 GGAAAATGGTATGAACTGAATGCTGATGGTACGGCGAGTAACAAAGAAGTGACACTTGGT 6660 
GKWYELNADGTASNKEVTLG 

6661 AACGTGGATGCAAACGGTAAGAAAGTTGTGAAAGTAACCGAAAATGGTGCGGATAAGTGG 6720 
NVDANGKKVVKVTENGADKW 

6721 TATTACACCAATGCTGACGGTGCTGCGGATAAAACCAAAGGCGAAGTGAGCAATGATAAA 6780 
YYTNADGAADKTKGEVSNDK 

6781 GTTTCTACCGATGAAAAACACGTTGTCCGCCTTGATCCGAACAATCAATCGAACGGCAAA 6840 
VSTDEKHVVRLDPNNQSNGK 
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* 

6841 GGCGTGGTCATTGACAATGTGGCTAATGGCGAAATTTCTGCCACTTCCACCGATGCGATT 6900 
GVVIDNVANGEISATSTDAI 

♦ . . 

6901 AACGGAAGTCAGTTGTATGCCGTGGCAAAAGGGGTAACAAACCTTGCTGGACAAGTGAAT 6960 
NGSQLYAVAK GVTNLAGQVN 
* 

6961 AATCTTGAGGGCAAAGTGAATAAAGTGGGCAAACGTGCAGATGCAGGTACAGCAAGTGCA 7020 
NLEGKVNKVGKRADAGTASA 

7021 TTAGCGGCTTCACAGTTACCACAAGCCACTATGCCAGGTAAATCAATGGTTGCTATTGC6 7080 
LAASQ LPQATMPGKSMVAI A 

7081- GGAAGTAGTTATCAAGGTCAAAATGGTTTAGCTATCGGGGTATCAAGAATTTCCGATAAT* ' 7140 
GSSYQGQNGLAIGVSRI SD N 

7141 GGCAAAGTGATTATTCGCTTGTCAGGCACAACCAATAGTCAAGGTAAAACAGGCGTTGCA 7200 
G K V I I R L S GT TN.S Q G K T GVA 

7201 GCAGGTGTTGGTTACCAGTGGTAAAGTTTGGATTATCTCTCTT AAAAAGCGG CATTTGCC 7260 
A G V G Y Q W " 

7261 SCSHTJTTATGGGTGGCTATTATGTATCGT 7291 
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• • ■ . • « 

HA2 1 MNKIPNVIWNVMTQTWVWSELTRTHTKRLRNR . GDPVLATLLFATVQA . 48 

I I I I I I I I I I I : I I I I I I I I I I I I I I I I .. : : I I I I I I 111:1 
HA1 1 MNKIFNVIWNVVTQTWVVTVSELTRTHTKCASATVAVAVLATLLSATVEAN 50 

• • ♦ * • 

49 NATDEDEELDPWRTAPVLSFHSDKEGTGEKEVTENSNWGIYFDNKG. . . 95 

II ♦ I * • • I • ••11*11 . . . • i 

51 NNTP VTNKLKAY . . GDANFNFTNNSIADAEKQVQEAYKGLLNLNEKNASD 98 

• • • • • 

96 ...VLKAGAITL . KAGDNLKXKQXTD 117 

I ... I I : . I : : I - . 

99 KLLVEDNTAATVGNLRKLGWVLS SKNGTRNEKSQQVKHADEVLFEGKGGV 148 

• ... • • 

118 EXTNAS SFTYSLKKDLTDLTSVATEKLSFGANGD . .KVDI 155 

: I • • I . : I : . I I II . I :. I . : I :. : . II:: 

149 QVTSTSENGKHTITFALAKDLGVKTATVSDTLTIGGGAAAGATTTPKVNV 198 

• • • * « 

156 TSDANGLKLAK TGNGNVHLNGLDS TL PDAVTNTGVLS S S S FTPND 200 

I I . . : I I I : I I . I . I I I I I : : I I I . I . : : 

199 TSTTDGLKFAKDAAGANGDTTVHLNGIGSTLTDTLVGSPATHIDG.GDQS 247 

• • . • • • 

201 VE KTRAATVKD VLN AGWN I KGAKT AG . . GNVES VDLVSAYNNVEFITGDK 248 
. . I I I I . : I i I II I I I I I I I . I . : : I . 1.11:1 . I : . I I I : • : I . 

248 THYTRAASIKDVLNAGWNIKGVKAGSTTGQSENVDFVHTYDTVEFLSADT 297 

• • • • • 

249 NTLDVVLTAKENXKTTEVKFTPKTSVIKEKIXyKLFTGKENNDTNKVTSNT 298 
: I . I . : . . I I I I II I I : . : I I I II I I I I I II II I I . I . : I I I U : . * 

2 98 ETTTVTVDSKENGKRTEVKIGAKTSVIKEKTO^ 347 

. . • • * 

299 ATDNTDEGNGLVTAKAVIDAVNKAGWRVKTTTANGQNGDFATVASGTNVT 348 
I I : : . I I I . I I I I I I . I I I I II I . I I I : I I I • I I I I I I I II I II I I I II I 

348 ATEDADEGKGLVTAKDVIDAVNKTGWRIKTTDANGQNGDFATVASGTNVT 3 97 

• • • • • 

349 FESGDGTTASVTKOTNGNGITVKYDAKVGDGLKFDSDKKIVADTTALTVT 3 98 
I . I I : I I I I . I I . : I : I I M I I I I I I II I I I : I : I I I . I I I I I I I I . 

3 98 FASGNGTTATVTNGT . . DGITVKYDAKVGDGLKLDGD . KIAADTTALTVN 444 

399 G GKVAE I AKEDDKKKLVNAGDL VTALGNL S WKAKAEADTDGA 440 

: I I II : : I . . I : I I I I . I : I I M I . . I I I . . . I : I . 

445 DGKNANNPKGKVADVASTDE . KKLVTAKGLVTALNSLSWTTTAAEADGGT 493 

• » • » » 

441 LEGISKDQEVKAGETVTFKAGKNLKVKQDGANFTYSLQDALTGLTSITLG 490 

1:1 . . : I I I I I I : . I I I I I I I I I I I I I : I II I I I I I I I I I I I I I I I I I I 
494 LIX5NASEQEVIUlGDir^FKAGKNLKVKQEGANFTYSLQDALTGLTSITLG 543 

• • • • ■ 

491 GTTNGGNDAKTVINKDGLTITPAGNGGTTGTNTISVTKDGIKAGNKAITN 540 
1.1:111 I I I I I I I I I I I ..: I I I I I I I I I II • I I • - . : - I 
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544 T- - .GNNGAKTEINKDGLTITPANGAGANNANTISVTKDGISAGC^SVKN 590 

541 VASGLRAYDDANFDVLNNSATDLNRHVEDAYKGLLNIiNEIOTANKQ • PLVT 589 

MM: : : I I I I I . | • •- 1 I . : I • : : : I I I I I I I I : I I . . : II 1:1. 
591 WSGLKKFGDANFDPLTSSADNLTKQNDDAYKGLTNLDEKGTDKQTPWA 640 

590 DSTAATVGDLRKLGWWS 607 

I . I I I I I I II I 1111:1 
641 DNT AATVGDLRGLGWVT S 658 
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Restriction maps of phage 11-17 and plasmid pT7-7 subclones 
B X PP NEP P E C E X 



-I 1 I ill itt i | 



X X 

1 ' ' I in 1 — i pHMW 8-3 

B C 

1 1 ''I mi pHMW 8-4 

X C 

1 LJ ' I ' ill pHMW 8-5 

B 9tnn C 

1 1 CZZ 3'i ill m pHMW 8-6 

X.N 

1 M- 1 pHMW 8-7 



1 KB 



bis B: BamHl P: PstI 

C: Clal X: Xbal 
E. EcoRl N: Nrul 



-V J 



1 kb, PUC19 pT7-7 



E,Ss X Bg p St X.RH 
pDC400 l 1 1 1 LJ 



X X S PH 
PDC601 



FIG..7 



SUBSTITUTE SHEET (RULE 26) 



WO 96/30519 



19/26 



PCT/US96/04031 




SUBSTITUTE SHEET (RULE 26) 



WO 96/30519 



PCT/US96/04031 




1 



HA2 


MNKIFNVIWN 


VMTQTWWS 


ELTR 


HA1 


MNKIFNVIWN 


VVTQTWWS 


ELTR 


HMW1 


MNKIYRLKFS 


KRLNALVAVS 


ELAR 


HMW2 


MNKIYRLKFS 


KRLNALVAVS 


ELAR 


AIDA-1 


MNKAYSIIWS 


HSRQAWIVAS 


ELAR 


Tsh 


MNRIYSLRYS 


AVARGFIAVS 


EFAR 


SepA 


MNKIYYLKYC 


HITKSLIAVS 


ELAR 


Consensus 


MNKIY--IWS 


-VTQ-W-VS 


ELAR 




FIG. 


.11 





SUBSTITUTE SHEET (RULE 26) 



WO 96/30519 



21 /26 



PCT/US96/04031 



12345 6789 10 11 



12 kb 
7 
5 
4 
3 

2 

1.6 



12 kb 

7 

5 
4 
3 

2 

1.6 



FIG..12 



1 2 3 4 5 6 7 8 9 10 11 12 



FIG..13 



SUBSTITUTE SHEET (RULE ?6) 



WO 96/30519 



PCT/US96/04031 



22/26 



1 


ATGAACAAAA 


TTTTTAACGT 


TATTTGGAAT 


GTTGTGACTC 


AAACTTGGGT 


51 


TGTCGTATCT 


GAACTCACTC 


GCACCCACAC 


CAAATGCGCC 


TCCGCCACCG 


101 


TGGCAGTTGC 


CGTATTGGCA 


ACCCTGTTGT 


CCGCAACGGT 


TCAGGCGAAT 


151 


GCTACCGATG 


AAAACGAAGA 


TGATGAAGAA 


GAGTTAGAAC 


CCGTACAACG 


201 


CTCTGTTTTA 


AGGTGGAGCT 


TCAAATCCGC 


TAAGGAAGGC 
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